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RANDOM  VARIABLES  AS  A DATA  TYPE* 

ABSTRACT 

This  report  discusses  the  use  of  random  variables  as  a data  type  for  programming 
languages.  It  demonstrates  that  for  complex  programs  the  results  of  the  use  of 
random  variables  are  non-computable.  After  imposing  restrictions  on  the  class  of 
programs  to  obtain  a practical,  although  limited  class  of  programs,  we  discuss  the 
major  problems  of  constructing  a statistical  compiler  which  accepts  distributions 
for  its  input  variables,  and  produces  the  distribution  of  its  output  variables.  Both 
simplification  rules  and  representation  techniques  for  such  a compiler  are  de- 
scribed. A simple  example  of  such  a compiler  which  has  been  implemented  is  de- 
scribed, and  the  problems  in  extending  the  implor,  ation  are  explored. 

Directions  for  future  research  work  in  this  area  and  techniques  for  evaluating  the 
utility  of  this  approach  are  discussed. 


*This  report  is  based  on  a thesis  of  the  same  title  submitted  to  the  Division  of 
Engineering  and  Applied  Physics  at  Harvard  University  on  May  1976  in  partial 
fulfillment  of  the  requirements  for  the  degree  ot  Doctor  of  Fhilosophj . 
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RANDOM  VARIABLES  AS  A DATA  TYPE 


I.  INTRODUCTION 

Consider  an  airplane  decelerating  on  a runway.  Given  the  value  of  such  variables  as 
deceleration  rate  (perhaps  time-dependent),  touchdown  speed,  and  touchdown  point,  one  could 
write  a program  P to  calculate  the  distance  the  plane  had  tra\  cled  down  the  runway  before  it 
reached  a safe  velocity  for  turnoff  (perhaps  40  knots).  Suppose  that  an  airport  designer  has  to 
place  an  exit  taxiway  at  some  distance  down  the  runway.  If  the  taxiway  is  too  far  down  the  run- 
way, then  planes  will  spend  a longer  time  on  the  runway  than  they  should,  whereas,  if  it  is  not 
far  enough,  then  many  planes  will  not  be  at  a safe  velocity  for  turning  by  the  time  they  reach 
the  exit.  The  values  of  the  touchdown  speed  and  the  location  of  the  touchdown  point  are  not  the 
same  for  every  plane  which  lands,  but  their  distribution  can  be  measured.  Similarly,  although 
the  measurement  problem  is  harder,  one  can  measure  the  deceleration  of  planes  on  the  runway 
or  one  can  assume  values  from  the  known  characteristics  of  various  models  of  airplanes.  The 
program  P that  was  originally  written  describes  how  to  calculate  the  distance  at  which  to  place 
the  exit  for  some  sp -cific  values  of  the  input  variables,  but  the  information  that  the  airport 
designer  really  wante  is  at  what  distance  will  say  95  percent  of  the  planes  be  able  to  turn  safely. 
(Presumably  he  will  provide  other  exits  for  those  unable  to  turn  at  this  strategically  placed 
exit.) 

The  techniques  described  in  this  report  will  allow  the  airport  designer  to  come  to  the  com- 
puter with  the  program  which  calculates  the  result  for  a particular  set  of  values  and  information 
regarding  the  distributions  of  the  set  of  values  for  the  input  variables.  The  result  will  be  infor- 
mation regarding  the  distribution  of  the  set  of  values  for  the  output  variables. 

Thus  the  programmer  can  think  of  the  behavior  of  one  individual  in  the  population  of  interest 
and  write  his  programs  to  describe  that  behavior.  Then,  in  a separate  step,  he  can  describe 
the  relevant  characteristics  of  his  population  using  the  appropriate  statistical  tools.  The  ma- 
chine car  then  combine  this  information  to  obtain  a description  of  the  output  population. 

Mere  specifically,  this  work  aims  toward  an  alternate  mode  of  computation  which  would 
allow  a user  to  request  that  his  program  be  compiled  into  code  which  will  accept  statistical 
descriptions  of  his  input  variables  and  will  produce,  when  executed,  statistical  descriptions  of 
the  output  variables. 

Formally  then,  given  a program  P (i.e.,  an  ordered  set  of  instructions  for  calculating  a 
set  of  output  values  from  a set  of  input  values),  I wish  to  "compile"  that  program  to  obtain 
another  program  P',  whose  inputs  are  statistical  descriptions  of  the  sets  of  values  assumed  by 
the  inputs  to  P and  whose  output  is  a statistical  description  of  the  sets  of  values  assumed  by 
the  outputs  of  I’.  We  refer  to  the  program  P'  as  the  statistical  analog  of  P,  and  the  process  of 
producing  this  program  will  be  referred  to  as  a statistical  compilation  of  P. 

Another  application  of  this  work  is  in  cost  forecasting.  Suppose  that  1 am  an  entrepeneur 
thinking  about  selling  widgets.  The  process  used  to  produce  widgets  is  not  fully  reliable,  and  in 
each  batch  of  widgets  produced  only  about  80  percent  are  acceptable.  Moreover,  this  yield  is 
variable;  on  some  days,  only  10  percent  of  the  widgets  will  be  acceptable,  while  on  others 
100  percent  will  be.  Also,  I know  the  current  cost  of  raw  materials  and  labor,  but  I want  to  be 
able  to  estimate  what  to  charge  for  widgets  a year  from  now  when  these  costs  will  have  changed. 

I am  able  to  guess  at  distributions  for  these  costs  in  a year  (in  the  jargon  of  the  decision 
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analyst,  I am  willing  to  assess  a prior  judgmental  distribution  for  these  values).  I would  like 
to  estimate  what  the  manufacturing  costs  per  widget  produced  will  be.  The  mean  value  of  these 
costs  will  allow  me  to  calculate  my  expected  profit,  while  the  weight  in  the  upper  tail  of  the  dis- 
tribution will  tell  me  my  risk.  Of  course,  two  of  my  variables,  namely  yield  and  cost  of  mate- 
rials and  labor,  are  only  known  statistically.  It  is  simple  to  write  a program,  assuming  T 
know  the  yield  and  the  costs  of  materials  and  labor,  to  obtain  the  manufacturing  cost  per  widget. 
I would  like  to  automatically  combine  this  program  with  the  distributions  for  yield  and  costs  to 
obtain  a distribution  for  the  manufacturing  cost  per  acceptable  widget. 

Another  way  to  view  this  report  is  in  terms  of  data  types  of  variables  in  a programming 
language.  Any  number  of  standard  data  types  exist  in  common  programming  languages  (e.g., 
FORTRAN,  PL/l).  The  extensible  languages  provide  the  programmer  with  mechanisms  for 
providing  additional  data  types  as  needed  (see  [Standish  75]  for  a survey  of  work  in  this  area). 
What  we  are  proposing  in  this  thesis  may  be  viewed  as  a new  data  type,  which  I shall  call  random. 
This  may  be  used  in  conjunction  with  any  of  the  standard  types  as  in  rcal\random  or  inl\ramiom  or 
rcal\array\random.  A variable  of  type  random  must  be  represented  internally  by  some  representation 
of  the  distribution  for  the  variable.  The  internal  representation  might  be  in  terms  of  the  den- 
sity function,  the  cumulative  density  function,  or  perhaps  some  alternate  form  such  as  moments. 
Moreover,  the  representation  may  be  symbolic,  i.e.,  the  system  implementor  might  choose  to 
represent  the  random  variable  by  a function  which  calculates  the  density  function. 

Regardless  of  which  representation  is  chosen,  the  implementor  may  now  proceed  to  define, 
in  an  extensible  language  environment,  exactly  what  is  meant  by  our  new  data  types.  For  ex- 
ample, the  sum  of  a real  x and  a realVaiidom  y would  be  defined  to  be  of  type  rcal\random,  and  its 
representation  would  be  the  representation  of  a distribution  which  is  that  of  y but  shifted  by  the 
amount  x.  Similarly,  forming  the  sum  of  two  rfal\ramiom  variables  would  cause  the  invocation 
of  a procedure  to  calculate  the  representation  for  the  convolution  of  their  respective  density 
functions. 

However,  as  the  implementor  discovers  fairly  rapidly,  the  construction  of  such  a system 
runs  into  many  difficulties.  Consider,  for  example,  the  following  program  (the  syntax  is  that 
of  ELI  [Wegbreit  74),  an  extensible  language  in  use  at  Harvard) 

DECL  X,  Y,  Z:  real\random; 

DECL  A,  B.  reaJ\random; 

A <-  gaussian(0,  3); 

/*  'A  is  assigned  a normal  distribution  of  mean  0 and  sigma  of 
3’; 

B <—  gaussian(2,  4); 

X <-  A + B; 

1 * 'X  is  thus  a normal  distribution  of  mean  2 and  sigma  of  5'; 

Y <- A + B; 

Z <-X  - Y; 

/*  'Z  is  properly  a point  distribution  with  probability  1 of  having 

the  value  O'; 
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ol  that  unless,  at  the  time  that  Z is  being  computed,  the  dependence  of  the  distributions 
for  Xu  Y is  included  in  the  computation,  the  result  would  be  incorrect.  Although  X and  Y 
are  both  gaussian(2,5),  it  is  not  correct  to  write  Z as  gaussian(2,5)  - gaussian(2,5)  which  would  eval- 
uate to  gaussifltt(0,S\/2).  We  assume  that  successive  calls  to  gaussian  produce  distributions  which 
are  statistically  independent.  However,  the  variables  X and  Y are  not  statistically  independent 
and  this  dependency  causes  the  different  result. 

The  thrust  of  this  report  is  the  extension  and  unification  of  a series  of  attacks  by  different 
authors  on  the  problems  of  computing  with  data  which  are  only  known  statistically.  Because  of 
the  wide  variety  of  statistical  representations  of  data  which  are  available,  previous  work  has 
varied  quite  broadly  in  the  approach  to  the  problem,  depending  on  the  assumed  representation. 
Thus,  we  have  at  one  extreme,  a description  of  a data  value  by  the  upper  and  lower  bounds  of 
the  interval  that  contains  it  as  in  the  interval  arithmetic  of  Moore  | Moore  66],  while  at  the 
other  extreme  is  Monte  Carlo  simulation  where  the  entire  distribution  function  of  the  data  is 
used.  Below  we  survey  some  of  the  more  well-known  attacks  on  this  type  of  computation  as 
well  as  some  of  the  work  which  is  closer  in  flavor  to  this  report. 

Interval  Analysis 

The  ideas  expressed  above  may  be  viewed  as  a generalization  of  the  interval  arithmetic  of 
Moore  [Moore  66].  In  his  work,  he  limits  the  descriptions  of  his  input  variables  to  simply  an 
upper  and  lower  bound  and  attempts  to  calculate  upper  and  lower  bounds  on  the  output  variables 
which  are  as  tight  as  possible.  The  automatic  methods  usually  provided  are  a package  of  sub- 
routines which  are  invoked  to  perform  interval  addition  when  addition  is  called  for  in  the  pro- 
gram, and  similarly  for  interval  subtraction,  multiplication,  and  division.  This  results  in  a 
simple  implementation  of  the  interval  scheme,  but  ignores  information  regarding  the  joint  de- 
pendencies of  variables.  As  a simple  example  of  this  consider  the  usual  identity 

(a  + b)  x = ax  + bx 

Suppose 

a = [l  2] 
b = [-3  -2] 

x=[3  5] 

then 

(a  + b)  = [-2  0] 

(a  + b)  x = [-10  0] 

whereas 

ax  = [3  10] 

bx  = [-15  -6  ] 

ax  + bx  = [-12  4] 

Note  that  the  reason  that  the  right  side  provides  such  poor  results  is  due  to  failure  to  take 
into  account  that  x must  have  the  same  value  both  times  it  is  evaluated.  That  is,  ax  and  bx 
may  not  be  evaluated  independently  (i.e.,  as  if  it  were  ax  and  by  where  x and  y nave  the  same 
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interval  description)  if  one  is  to  obtain  a tight  bound  on  the  result  for  ax  + bx.  In  general,  the 
failure  to  take  cognizance  of  these  dependencies  broadens  the  bounds  which  are  obtained  by 
interval  analysis  techniques.  One  can  show  indeed  that  for  any  intervals  I,  J,  K 

I(J  + K)  C IJ  + IK 

although  if  JK  > 0 (an  interval  Is  greater  than  zero  if  its  lower  bound  is  greater  than  zero), 
then  equality  holds.  Depending  on  the  extent  of  these  dependencies,  the  bounds  obtained  may 
or  may  not  be  practically  useful. 


Monte  Carlo  Simulation 


Somewhat  at  the  other  extreme  of  what  can  be  done  are  the  techniques  of  Monte  Carlo  simu- 
lation IHammersley  64).  Here  the  assumption  is  that  distributions  are  provided  for  each  of  the 
input  variables  and  sample  sets  of  input  values  are  then  drawn  from  the  input  distributions. 

The  resulting  values  are  then  used  in  the  computation  in  order  to  obtain  an  output  value.  Ihe 
process  is  repeated  many  times  in  order  to  build  up  the  statistics  of  the  output  variable.  By 
clever  sampling,  it  is  often  possible  to  obtain  the  output  distribution  much  more  efficiently 
than  "crude"  sampling  techniques  would  indicate.  Still  there  are  a number  of  limitations  of  the 
technique 

(1)  It  requires  a distribution  for  the  input  variables,  rather  than  some  re- 
duced amount  of  information  (i.e.,  the  first  four  moments). 


(2)  It  requires  many  executions  of  the  program  to  build  the  desired  mea- 
sures of  the  output  variables  to  the  necessary  precision. 

(3)  It  requires  generating  random  samples  from  what  may  be  complex  and 
interdependent  distributions. 

(4)  The  program  is  viewed  as  a black  box,  and  any  information  regarding 
its  structure  is  ignored. 


Note  however,  that  there  are  no  problems  due  to  dependencies  between  variables  in  the 
program.  Since  each  execution  of  the  program  reflects  the  dependencies  which  exist,  the  re 
suiting  output  values  also  accurately  reflect  those  dependencies. 


Franson's  Work 

Franson,  in  a Master's  thesis  [Franson  69]  at  the  Naval  Postgraduate  School  in  Monterey, 
California,  formulates  some  basic  rules  for  manipulating  probability  distributions  to  attack 
problems  similar  to  the  ones  discussed  here.  The  compilation  techniques  are  essentially 
manual  while  the  computation  algorithms  invoked  are  fairly  straightforward  numerical  integra- 
tions of  the  convolution  integrals  obtained.  In  addition,  the  techni  |Ue  does  not  handle  depen- 
dencies among  the  variables. 

LISADICS  Simulator 

A group  at  the  University  of  Notre  Dame  has  developed  a simulation  for  the  Law  Engineer- 
ing Analysis  of  Delay  in  Court  Systems  (LEADICS)  project  jSain  73).  They  model  each  step 
through  a criminal  justice  system  by  a representation  of  the  probability  distribution  of  the  delays 
through  that  step.  An  individual  leaving  one  step  of  the  criminal  justice  system  may  directly 
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proceed  to  the  next  step  or  pass  through  a multiway  branch  which  has  a probability  associated 
with  each  path,  or  through  a feedback  path.  Their  simulator  combines  the  distributions  at 
each  step  to  produce  the  delay  distribution  for  the  entire  system.  They  represent  tl  e delay 
distribution  by  a rational  fraction  approximation  to  the  characteristic  function  and  manipulate 
it  to  represent  the  operations  ot  addition,  branching,  and  feedback. 

Their  simulation  handles  only  a very  restricted  set  of  operations  on  the  distributions  and 
it  is  not  obvious  how  to  extend  it.  Moreover,  they  have  completely  ignored  effects  due  to  cor- 
relation between  the  distributions  in  their  model;  i.c.,  if  the  time  from  the  commission  of  a 
crime  to  arrest  is  long,  is  the  trial  also  likely  to  be  long?  In  general,  these  possible  correla- 
tions aftect  their  results  in  an  unknown  fashion  and  must  be  included  in  a more  careful  analysis. 

Berzins  Work 

Of  all  the  work  I am  aware  of,  a recent  Master's  thesis  by  Berzins  [Berzins  75]  represents 
the  most  sophisticated  development  in  this  area.  His  work  was  performed  in  the  context  of  an 
eitort  to  build  an  automatic  programming  system  which  will  generate  programs  implementing 
business  information  systems  (called  Protosystem  I).  Part  of  this  system  is  an  optimizer  for 
choosing  data  representations  and  program  structures.  This  requires  a "question  answerer" 
which  will  supply  information  about  the  expected  behavior  of  the  information  system, 

His  construction  of  such  a question  answerer  by  techniques  which  are  cleverer  than  Monte 
Carlo  analysis  runs  into  many  of  the  problems  addressed  in  this  thesis.  Bis  system  consists 
of  a number  of  pieces  including  an  interval  analysis  package,  a Monte  Carlo  simulator,  and  a 
package  for  the  analysis  of  random  variables  by  the  manipulation  of  distributions,  llis  work  on 
random  variables  provides  an  implementation  which  differs  from  that  described  in  this  thesis 
in  a number  of  ways:  (1)  In  my  terms,  the  class  of  base  functions  he  has  chosen  are  plus  and 
a conditional  of  the  form  ij  * N v then  u else  However,  when  he  needs  the  analysis  of  more 
complicated  situations  such  as  looping  or  multiplication,  he  uses  the  Monte  Carlo  simulator  in 
the  environment  to  obtain  an  approximate  answer  and  then  continues,  and  (2)  his  handling  of 
dependent  variables  is  by  maintaining  a measure  of  the  degrees  of  freedom  of  his  input  vari- 
ables (the  "effective  number")  and  using  this  at  various  points  to  improve  his  estimate  of  the 
answer,  lie  clearly  recognizes  this  as  an  inadequate  solution  to  the  problem  for  he  states: 

[In  an  operation,  if)  the  input  variables  are  interdependent,  then  the 
correlations  get  very  complicated  and  even  though  they  are  not  necessarily 
small  effects,  I do  not  see  a practical  way  of  taking  them  into  account.  In 
these  cases,  1 recommend  treating  the  outputs  as  though  they  were  independent 
even  though  they  arc  not. 

It  is  probably  a good  idea  to  put  in  a test  for  the  troublesome  case,  and 
to  print  a warning  that  an  unsafe  answer  has  been  produced.  . . | BCrzins  75, 
pp.  199-2  00]. 

Overview  of  Document 

1 begin  by  proving  (Sec.  11,  Theorem  1)  that  the  problem  stated  in  its  most  general  form  is 
non-computable  in  the  classic  sense  of  being  equivalent  to  the  halting  problem  for  Turing 
machines.  One  approach  to  circumventing  this  result  is  to  reduce  the  class  of  programs  which 
may  be  considered  but  provide  exact  techniques  for  that  reduced  class.  It  is  this  approach  which 


is  explored  in  this  report.  There  are  however  other  approaches  possible.  One  such  approach 
is  to  handle  all  functions  with  a weaker  description  of  the  random  variables  than  their  distri- 
bution as  is  done  in  interval  analysis.  Another  approach  is  to  accept  some  theoretical  inaccu- 
racies in  the  result  as  in  Monte  Carlo  simulation. 

Accordingly,  we  restrict  our  attention  to  a class  of  functions  which  may  be  described  as 
trees  (that  is,  they  contain  no  loops  at  all).  In  Sec.  II,  I compare  this  class  of  functions  to  the 
various  hieraichies  of  computational  complexity  of  functions  which  have  been  explored  in  the 
literature. 

In  Sec.  IV,  I discuss  the  simplification  rules  for  a statistical  compiler,  while  Sec.  V ex- 
plores the  variety  of  representation  techniques  which  might  be  employed.  These  topics  are  the 
key  design  issues  faced  by  the  implementor  of  a statistical  compiler. 

In  Sec.  VI,  I switch  gears  to  a more  experimental  approach  and  describe  an  implementation 
of  a prototype  system  for  statistical  computations  which  I have  constructed  in  ECL  at  Harvard. 

Section  III  states  my  approach  to  the  problem  of  producing  statistically  analogous  computa- 
tions for  any  tree-type  computation.  Here  I indicate  why  a compilation  approach  is  necessary 
and  delineate  the  key  issues  for  constructing  such  a compiler.  Section  VII  discusses  what  would 
be  required  to  extend  this  work  to  provide  a practical  general-purpose  system  (i.e.,  a FORTRAN 
with  random  variables  as  a data  type).  Finally,  Sec.  VIII  is  some  thoughts  about  future  work  in 
this  area. 

The  main  result  is  a new  technique  for  attacking  this  type  of  problem.  In  addition. 

Theorem  1 in  Sec.  II  appears  here  for  the  first  time  although  the  result  is  due  to  D.  Tsitchritzis. 
The  Rules  4 and  5 in  Sec.  IV  are  new  here,  while  Rule  3,  although  discovered  independently,  is 
a restatement  of  already  known  results.  The  computational  techniques  in  Sec.  V which  use 
moments  are  new  to  the  computer  science  field,  primarily  because  of  the  discussion  of  numer- 
ical errors  in  the  solution  of  the  moment  problem,  although  these  results  have  all  appeared  in 
the  mathematical  literature. 
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II.  PROBLEM  REDUCTION 


Let  me  begin  with  a more  formal  statement  of  the  problem.  I wish  to  construct  a program  C 
which  accepts  the  following  items  as  inpit: 

f - a program  which  is  the  computational  description  of  some  mathematical 
function  with  a real-valued  result,  drawn  from  a class  F. 

Pi  - a set  of  programs  which  are  computational  descriptions  of  the  probability 
distributions  for  the  input  variables  of  f (which  I will  assume  independent 
for  this  section).  1 make  the  further  assumption  that  the  p are  discrete 
distributions,  an  assumption  I shall  relax  shortly. 

z - a value  chosen  from  the  set  of  possible  output  values  for  f. 

The  value  returned  by  C(f,  p.,  z)  is  the  probability  that  the  result  of  f will  be  less  t)  or 
equal  to  z given  that  the  probabilities  of  the  input  variables  are  as  described  by  the  p 's. 

Note  that  this  view  of  C is  not  intended  to  preclude  possible  partitions  of  the  work  performed 
by  C.  C might  be  implemented,  for  example,  as  a "statistical  compiler"  SC,  whose  input  is 
merely  f and  whose  output  is  an  f ,{pi,  z)  which  when  fed  the  p 's  and  z can  complete  the  calcu- 
lation. Thus,  we  might  write  C(f,  p . z)  = (SC(f)J  (p.,  z).  Yet  another  partition  of  C provides  a 
"Distribution  Calculator"  DC,  whose  input  is  f and  the  p.'s  and  whose  result  is  a computational 
description  of  the  probability  distribution  for  z.  That  is,  C(f,  p , z)  = [DC(f,  p ))  (z). 

As  a very  simple  examp'e  of  what  is  intended  here,  consider  f(a,  ul  = a + b.  Recalling  that 
the  probability  density  function  of  the  sum  of  two  independent  random  variables  is  the  convolu- 
tion of  the  probability  density  functions  of  the  random  variables,  we  have  SC  producing  a gen- 
eral convolver.  That  is,  SC  produces  f'  whose  inputs  are  two  functions  and  a specific  value  z 
and  evaluates  the  convolution  of  those  functions  at  the  point  z.  DC,  on  the  other  hand,  accepts 
as  input  the  function  f and  in  addition  descriptions  of  pg  and  pfe  and  produces  the  distribution  of 
the  result  (perhaps  using  FFT  techniques). 

Assuming  the  class  of  functions  F is  sufficiently  broad,  we  may  prove  that  C cannot  exist, 
regardless  of  which  implementation  form  is  chosen.  Consider  Kleene's  predicate  T(x,  y,  z)  = 0 
if  Turing  machine  x starting  with  v on  its  input  tape  cfops  at  exactly  z steps;  1 othe.  wise. 
Consider  a restricted  form  of  Kle  :ne's  predicate  Ay.  T(x,  x,  y)  = yy).  That  is,  we  look  at  a 
function  which  for  a specific  Turing  machine  x started  with  x on  its  input  tape  returns  0 if 
the  machine  will  stop  at  exactly  y steps  and  1 otherwise.  Then  we  have  the  following  theorem: 

Theorem  1.  (D.  Tsitchritzis) 

If  F includes,  for  every  Turing  machine  x.  Ay.  T(x,  x,  y)  and  py  is  discrete,  then  C can- 
not produce  correct  output  for  all  ft  F. 

Proof. 

Choose  Py  so  that  every  positive  integer  n has  some  probability  of  occurrence.  For  ex- 
ample, such  a choice  might  be  py(n)  = 2"*n+1)  for  positive  integral  n. 

Now  Turing  machine  x either  halts  with  x on  its  input  tape,  or  it  doesn't.  Suppose  it 
doesn't.  Then  C[Ay.  T(x,  x,  y),  p„,  0 j will  be  zero;  that  is,  if  the  machine  x never  halts,  then 

J 

the  probability  of  it  halting  at  y steps  for  any  y is  zero.  On  the  other  hand,  if  Turing  ma- 
chine  x halts  after  q steps,  then  C[Ay.  T(x,  x,  y),  0)  will  evaluate  to  2"(cI+1).  That  is,  if 
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machine  x halts  at  q steps,  then  the  probability  of  Xy.  T(  x,  x,  y)  reporting  its  halting  is  p (q), 
the  probability  assigned  to  the  occurrence  of  he  value  q for  the  variable  y.  Therefore,  the  test 
comparing  C[Xy.  T(x,  x,  y),  p^,  0|  to  0 represents  the  test  that  Turing  machine  x ever  halts. 
Thus,  C represents  a solution  to  the  halting  problem  and  therefore,  cannot  exist. 

The  extension  to  continuous  distributions  follows  directly. 

Corollary. 

If  F includes,  for  every  Turing  machine  x,  Xy.  T(x,  x,  I y)  (where  [_y  is  the  "floor"  of  /, 
the  greatest  integer  less  than  or  equal  to  y),  then  C cannot  produce  correct  output  for  all  fcF. 

Proof. 

Same  as  the  previous  proof  except  choose  py  to  be  a continuous  distribution  with  positive  i 
weight  in  every  interval  [i,.  +1)  for  every  positive  integer;. 

As  a matter  of  strategy,  we  must  now  consider  what  useful  options  are  left  to  explore  in 
spite  of  the  negative  result. 

One  difficulty  was  that  we  allowed  a distribution  which,  although  finitely  describable,  had 
positive  weight  at  an  infinite  number  of  points.  Were  there  only  a finite  number  of  points  in  p , 
the  function  C could  be  written  as  follows:  y 

(A)  Build  a simulator  for  Turing  machine  x 

(B)  For  q increasing  by  1 do 

(C)  Simulate  step  q 

(D)  If  Turing  machine  x halts,  output  p (q) 

(E)  Accumulate  py(q)  into  p 

(F)  If  p equals  1,  terminate;  else,  loop. 

This  must  complete  in  a finite  time,  since  only  a finite  number  of  values  of  q must  be 
explored. 

Similarly,  if  we  are  willing  to  accept  a result  within  6 > 0 of  the  true  answer,  where  6 is 
given  as  an  additional  input  to  C,  we  can  solve  the  problem  in  similar  fashion.  Merely  change 
the  test  in  step  (F)  above  to  p greater  than  1-6.  Since  at  some  finite  point,  the  weight  of  the 
tail  of  the  p^  distribution  must  be  less  than  6 this  process  must  also  terminate. 

Finally,  another  approach  we  may  consider  is  to  restrict  the  class  F so  as  not  to  include 
Xy.  T(x,  x,  y)  for  all  x.  Since  the  proof  depends  on  the  use  and  properties  of  Kleene's  predicate, 
the  proof  presented  would  not  work  in  this  case,  and  indeed,  the  remainder  of  the  report  will  be 
devoted  to  describing  techniques  which  may  be  employed  for  a class  F which  does  not  contain 
Kleene's  predicate. 

I choose  this  approach  to  attacking  the  problem  for  the  following  reasons: 

1.  Monte  Carlo  techniques  very  effectively  attack  the  case  where  we  are 
willing  to  get  an  answer  which  is  within  6 of  the  true  answer,  although 
the  computation  time  increases  without  limit  as  6 approaches  zero. 

2.  The  restriction  to  a finite  distribution  allows  some  trivial  techniques 
to  be  used  which  can  b>-  very  expensive  to  compute.  Any  ways  around 
these  expensive  computations  that  I have  attempted  to  devise  have  not 
crucially  depended  upon  the  finiteness  of  the  distribution.  They  are 
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thus  liable  to  being  applied  to  infinite  distributions  and  are  therefore,  by 
Theorem  1,  not  applicable  to  complicated  programs, 

3.  My  restricted  class  of  functions  is  still  large  enough  to  be  interesting 
both  theoretically  and  practically. 

Computation  Trees 

Therefore,  we  look  for  some  restriction  which  /ill  eliminate  Klecne's  predicate  from  our 
class  of  functions.  We  note  that  computing  Kleene's  predicate  involves  (1)  simulation  of  a single 
step  of  any  arbitrary  Turing  machine,  (2)  looping  for  the  number  of  steps  specified  in  the  input, 
and  (3)  testing  for  a halting  configuration  of  the  Turing  machine.  Due  to  the  simplieity  of  Turing 
machines,  (1)  and  (3)  are  exceedingly  simple  to  perform  and  any  language  which  cxeludes  them 
is  probably  uninteresting.  In  addition,  during  the  preliminary  work  on  this  report,  the  language 
constructs  which  caused  the  largest  amount  of  difficulty  in  developing  algorithms  to  handle  random 
variables  were  the  constructs  involved  with  looping,  i.e.,  FOR,  GO  TO,  HillL.E,  ete.  Thus,  we 
choose  as  our  restricted  language  a computation  form  whose  programs  may  be  represented  as 
trees  (no  loops  or  cycles  are  permitted)  and  call  these  computation  trees. 

Formally,  given  a base  class  of  functions  !f,  we  may  talk  about  the  class  of  computation 
trees  n based  on  functions  from  ‘f.  The  nodes  in  the  tree  will  indicate  variables  and  the  edges 
will  indicate  the  dependence  among  variables.  Each  node  which  is  not  a terminal  node  will  also 
indicate  a function  from  the  class  !f  which  is  to  be  applied  to  the  values  assigned  to  the  ances- 
tors of  this  node  to  compute  the  value  at  the  node.  In  addition,  each  node  will  indicate  an  order- 
ing among  its  ancestor  nodes  to  c orrespond  to  the  arguments  to  the  function.  For  simplieity, 

1 will  consider  a single  root  node  which  is  the  output  variable  of  the  tree.  Some  typical  trees 
are  shown  in  Fig.  11-1.  We  think  of  arcs  in  the  tree  as  directed  from  the  terminal  nodes  toward 


Fig. II- 1. 

the  root.  To  each  node,  we  may  assign  a level  number  which  is  the  length  of  the  longest  path 
from  any  terminal  node  to  that  node.  Then  the  computation  to  evaluate  the  output  from  a given 
set  of  input  values  proceeds  as  follows: 

(1)  Assign  the  input  values  to  the  corresponding  terminal  nodes  in  the  tree 
and  mark  those  nodes  evaluated. 

(2)  c hoose  any  unevaluated  node  whose  level  is  less  than  or  equal  to  the 
level  of  all  other  unevaluated  nodes. 
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(3)  Evaluate  the  chosen  node  by  evaluating  the  function  assigned  to  the  node 
and  using  as  arguments  the  values  assigned  to  the  ancestor  nodes  in  the 
order  indicated.  Since  the  node  chosen  for  evaluation  is  of  lowest  pos- 
sible level,  all  nodes  at  lower  levels  have  already  beer,  evaluated. 

Therefore,  since  a node's  level  must  always  be  greater  than  the  level 
of  any  of  its  ancestors  all  needed  values  are  already  evaluated. 

(4)  If  the  node  just  evaluated  is  the  root  node,  then  we  are  finished;  other- 
wise, go  back  to  step  (2). 

Since  a tree  contains  only  a finite  number  of  nodes,  the  evaluation  process  must  complete, 
assuming  the  evaluations  of  all  the  base  functions  terminate. 

The  class  ft  then  is  the  smallest  class  of  functions  which 

(1)  includes  the  functions  of  :f 

(2)  is  closed  under  functional  composition  and  selection. 

The  development  in  Sec.  IV  will  be  independent  of  the  choice  of  the  class  ‘f  but  will  assume 
that  we  know  how  to  perform  certain  basic  operations  related  to  the  functions  in  :f.  More  spe- 
cifically, if  fc  ff,  we  must  know  computational  techniques  for  evaluating 

P(K)  = ^ ( p(a,  b,  c, . . . ) da  db  dc.  . . (1 ) 

a,  b,  c, . . . 

3f(a,  b,  c,  . . . ) = k 


where  p(a,  b,  c,  . . . ) is  either  a product  of  input  distributions  or  is  the  product  of  other  similar 
computations.  For  example,  if  f in  Eq.  (1)  is  addition,  then  the  operation  intended  is  bqtter 
known  as  convolution,  while  if  f is  subtraction  then  the  operation  is  correlation.  As  f becomes 
more  complicated,  the  computations  invoked  by  Eq.  (1)  become  quite  difficult.  We  will  refer  to 
this  type  of  computation  throughout  this  report  as  f-eonvolution  because  of  its  similarity  to  the 
usual  convolution  operations. 

In  Sec.  V,  we  will  discuss  how  to  compute  (1)  for  several  "simple"  functions  which  we  in- 
clude in  If  (the  constants,  addition,  subtraction,  multiplication,  etc.).  As  more  computational 
techniques  become  available  the  class  of  functions  which  can  be  computed  by  the  trees  in  ft  ex- 
pands correspondingly.  In  particular,  the  inclusion  of  a conditional  function  of  the  form 
f(x,y,z) - Jf  x 0 THEN  y LISE  z 

would  be  particularly  interesting  since  it  would  provide  conditional  expressions  in  ft.  We  sug- 
gest in  Sec.  V some  computational  devices  for  handling  this. 

Let  us  compare  the  possible  classes  ft  to  some  of  the  hierarchies  of  computational  complex- 
ity which  have  been  studied  in  the  literature.  This  will  help  to  elucidate  the  restrictions  imposed 
on  ft  by  its  definition  and  also  will  indicate  which  functions  might  be  in  ;t . Meyer  and  Hitchie  in 
a 1967  paper  [Meyer  67]  have  studied  the  loop  hierarchy  (V'J.  The  class  ft  cannot  be  extended 
to  include  all  the  elementary  functions  since  Kleene's  predicate  is  known  to  be  elementary 
[ llitchie  63;.  '1  hus,  ft  ~Z)  i’2  since  i.,  is  the  class  of  elementary  functions,  i'^,  however,  may 

be  characterized  as  the  smallest  class  containing  the  constants,  addition,  proper  subtraction, 
modulo,  integer  division,  and  closed  under  functional  composition  and  conditional  expressions. 
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If  we  can  extend  If  to  include  the  functions  mentioned,  then  BDi'j.  Moreover,  it  might  well 
be  possible  to  include  exponentiation  in  If  and  this  would  result  in  fi  being  greater  than  i’j  3ince 
the  exponentials  are  not  in  A’^. 

Another  well-known  hierarchy  is  the  Grzegorczyk  hierarchy  f£n).  For  n > 2,  it  has  been 
shown  [Meyer  67]  to  be  equivalent  to  the  loop  hierarchy,  i.e.,  6 = V . In  particular,  then 
f’3  is  the  elementary  functions  and  thus  n *'2  mav  be  characterized  [Cobham  64]  as  the 

smallest  class  containing  the  successor  and  multiplication  functions,  and  closed  under  the  op- 
erations of  explicit  transformation,  composition,  and  limited  recursion.  While  SI  includes  all 
the  base  functions  and  the  operation  of  explicit  transformation  and  composition,  it  does  not  in- 
clude the  operation  of  limited  recursion.  Limited  recursion  is  a much  more  potent  combining 
rule  than  any  allowed  in  the  construction  of  Q.  Still  we  are  willing  to  allow  more  complex  func- 
tions in  our  base  class  of  functions  If  than  are  provided  in  the  base  set  for  £ Moreover,  since 
‘•’3  may  be  produced  by  adding  exponentiation  to  the  class  of  base  functions  which  generate 
then  exponentiation  cannot  be  in  Since  it  might  be  possible  to  include  exponentiation  in  It, 

Q may  contain  functions  not  in  £,. 

On  the  other  hand,  £ contains  the  form  of  Kleene's  predicate  Ay.  T(x,  x,  y)  which  we  have 
used  [ Ritchie  63].  Thus,  f!  and  £2  arc  not  comparable.  The  basic  difference  may  be  summa- 
rized by  the  statement  that  fi  includes  a more  extensive  set  of  base  operations  while  includes 
a more  powerful  combining  rule. 
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III.  OVERVIEW  OF  THE  TECHNICAL  APPROACH 


In  Sec.  I,  I described  a very  broad  class  of  problems  and  suggested  that  an  object  called  a 
"statistical  compiler"  could  be  used  in  the  solution  of  these  problems.  In  Sec.  II.  I demonstrated 
that  the  class  was  too  broad  and  described  a restricted  class  of  computations  called  computation 
trees  which  can  be  attacked  with  a statistical  compiler  approach.  In  this  section,  I want  to  indi- 
cate the  major  parts  of  a statistical  compiler  and  describe  the  similarities  and  differences  with 
conventional  compilers. 

There  are  a number  of  steps  which  must  be  performed  to  convert  a program  written  in 
some  language  to  code  which  can  be  executed  by  a computing  machine.  These  steps  indicated 
in  T ig.  Ill- 1 involve  the  breaking  down  of  the  text  of  the  program  into  basic  units  (lexical  anal- 
ysis), the  combination  of  those  units  into  well-formed  expressions  in  the  language  (syntactic 


Fig.  I1I-1.  Major  compiler  phases. 

analysis),  extracting  the  intended  operations  from  the  expressions  (semantic  analysis),  and  pro- 
ducing the  code  to  execute  the  program  (code  generation).  In  addition,  one  or  two  phases  of  code 
optimization  may  be  employed:  one  between  the  semantic  analysis  and  code  generation,  per- 
forming optimizations  based  on  fiow  analysis  (i.e.,  dead  variable  analysis,  code  motion)  and 
another  following  or  part  of  the  code  generation  performing  local  machine-dependent  optimiza- 
tions (i.c.,  register  allocation,  elimination  of  unnecessary  loads). 

In  any  particular  compiler,  these  phases  may  not  be  distinct  in  either  time  or  code;  however, 
all  compilers  must  perform  these  basic  operations  to  compile  a program.  An  extensive  discus- 
sion of  code  optimization  techniques  may  be  found  in  [Schaefer  73  |. 

In  a statistical  compiler,  the  same  major  phases  can  be  observed.  Moreover,  some  of  ‘lie 
phases  are  hardly  modified  at  all,  whether  employed  in  a statistical  or  conventional  compiler. 

The  lexical  analysis,  syntactic  analysis,  and  semantic  analysis  which  can  produce  computation 
trees  as  described  in  Sec.  II  are  straightforward  applications  of  techniques  currently  in  use 
|Aho  72). 

On  the  other  hand,  the  later  phas  ■ compiler  must  be  significantly  different  from  a 

conventional  compiler  because  the  s antic  associated  with  the  operations  in  the  computation 
trees  are  very  different.  Now,  ins  id  of  it  ion  as  a basic  operation,  convolution  becomes 
the  basic  operation.  'These  changes  i u ly  affect  the  code  generation  phase  which  must  choose 
appiopriate  representations  for  the  variables  and  prouuce  code  for  the  operations  on  those 
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representations.  The  subject  of  efficient  and  accurate  representation  techniques  is  at  the  heart 
of  constructing  a statistical  compiler  and  the  issues  are  explored  in  Sec.  V. 

The  need  for  optimization  of  the  operations  requested  by  the  semantics  is  also  apparent. 
Convolution  operations  are  fairly  expensive  of  computer  time,  thus  any  reductions  that  can  be 
performed  at  compile  time  will  pay  off  handsomely  at  run  time. 

I have  indicated  two  distinct  phases  of  optimization:  global  and  local.  Local  optimizations 
which  occur  toward  the  tail  end  of  code  generation  will  deal  with  improving  the  code  produced 
for  specific  cases,  but  are  not  prepared  to  deal  with  the  important  overall  strategy  questions 
which  affect  the  amount  of  storage  and  execution  time  required  in  a first-order  way. 

As  the  small  ECL  example  in  Sec.  I indicated,  a key  question  in  producing  accurate  moults 
is  the  joint  dependence  of  the  variables  entering  into  each  step  in  the  computation.  But  which 
joint  distributions  must  we  keep?  Joint  distribution  functions  will  require  large  amounts  of 
storage,  and  execution  time,  both  probably  increasing  exponentially  with  the  number  of  variables. 

In  the  Sec.  I example,  only  the  joint  distribution  of  X and  Y is  needed  to  accurately  calculate 
the  distribution  for  Z.  However,  without  looking  ahead  to  the  use  of  X and  Y,  there  is  no  way  to 
know  that  one  can  afford  to  discard  information  about  the  joint  distributions  of  X and  A,  or  X 
and  B.  Indeed,  if  we  did  not  know,  at  the  time  of  execution  of  the  statement  Y «-  A + ft  the  in- 
tended use  for  Y,  we  vould  be  forced  to  calculate  and  maintain  a representation  of  the  joint  dis- 
tribution for  the  variables  X,  Y,  A,  and  0. 

Thus,  we  are  led  to  the  conclusion  that  any  statistical  compiler  must  employ  optimization 
techniques  aimed  directly  at  reducing  joint  distributions  by  looking  ahead  at  the  use  of  the  vari- 
ables. S°ction  IV  describes  a series  of  transformations  to  apply  to  the  operations,  referred  to 
as  the  analogous  program,  which  must  be  performed  to  "execute"  a computation  tree.  These 
' . ansformations  start  with  an  analogous  program  which  maintains  all  possible  joint  dependencies, 
and  remonstrates  that  certain  reductions  based  on  the  analogous  program  and  the  computation 
tree  can  be  performed  without  altering  the  distribution  calculated  for  the  output  variable. 

These  simplification  rules  are  then  the  equivalent,  in  a general  way,  to  the  usual  global 
optimization  techniques  such  as  code  motion,  dead  variable  analysis,  etc.  The  question  now 
arises:  when  is  a transformation  a simplification?  or,  stated  differentlv,  which  transforma- 
tions should  be  applied  to  reduce  the  required  effort  at  run  time?  T1  rules  themselves  merely 
guarantee  that  their  application  will  not  affect  the  accuracy  of  the  > esult,  but  only  the  efficiency. 

This  question  cannot  be  answered  solely  in  terms  of  the  transformations,  since  estimates 
of  the  cost  of  the  analogous  program  are  necessary.  For  example,  it  may  well  be  necessary  to 
know  whether  it  is  better  to  store  a joint  distribution  of  three  variables  or  calculate  ten  con- 
volutions. These  questions  obviously  depend  critically  on  the  representation  techniques  which 
are  employed. 

The  transformations  described  in  Sec.  IV'  are  intended  to  be  independent  of  the  choice  of  rep- 
resentation, and  thus  no  direct  answer  to  these  questions  is  presented  in  this  report.  Rather, 
the  approach  here  is  to  describe  the  options  for  transformations  in  Sec.  IV,  the  representation 
options  in  See.  V,  and  indicate  some  specific  ways  of  employing  these  in  Secs.  VI  and  VII. 

The  more  general  approach  of  attempting  to  describe  the  sequence  of  transformations  to  be 
performed  based  on  cost  functions  for  the  analogous  program  lias  some  basic  flaws  at  this 
time.  The  first  is  that  understanding  the  nature  of  these  cost  functions  will  require  more  ex- 
perience with  the  representation  techniques  in  Sec.  V to  reveal  the  basic  properties  common  to 
all  such  cost  functions.  Further,  the  general  statements  which  might  be  obtained  bv  tl  is 
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approach,  given  the  vague  properties  which  can  be  ascribed  to  the  cost  functions  in  a repre- 
sentation independent  fashion,  lead  me  to  the  conclusion  that  these  statements  are  likely  to  be 
vacuous.  For  example,  although  it  is  possible  to  suggest  some  hill-climbing  approach  to  im- 
proving the  analogous  program,  the  available  steps  may  well  be  so  gross  that  no  continuity  is 
apparent. 

For  these  reasons,  1 have  chosen  in  this  report  to  focus  on  the  issues  faced  in  the  imple- 
mentation of  a particular  statistical  compiler,  drawing  from  the  general  techniques  in  Secs.  I\ 
and  V as  necessary. 


IV.  SIMPLIFICATION  TECHNIQULS 

In  this  section,  we  discuss  rules  which  allow  uf  to  alter  the  analogous  progr  n for  a 
particular  tree  in  ways  which  preserve  the  correctness  of  the  computation.  Five  such  rules 
and  their  proofs  are  presented  here.  The  reason  for  stopping  at  five  is  the  simple  one  that 
I can  think  of  no  others  which  apply. 

Throughout  this  chapter,  we  will  make  the  noncritical  simplifying  assumption  that  our 
base  functions  are  binary. 

Notation 

«ve  will  often  have  occasion  to  refer  to  sets  of  nodes  in  a computation  tree,  and  we  will 
u'e  the  usual  set  notation  {A.  B,  C,  D).  Each  node  will  be  named  by  a letter  of  the  capital 
alphabet  and  perhaps  a subscript.  Thus,  a set  of  nodes  might  be  denoted  as  (X  , X,, ...  X }. 

We  may  abbreviate  this  as  {X}.  The  condition  that  node  A has  been  evaluated  to  "a"  during 
a particular  evaluation  of  the  computation  tree  will  be  denoted  as  A = a.  Similarly,  the  con- 
dition for  a set  of  nodes  will  be  denoted  {A  = a,  B = b,  C = c,  D = d}.  It  will  often  be  convenient 
to  abbreviate  this,  and  we  will  use  square  brackets  around  the  set  to  indicate  it,  thus  [A,  B,  C,  D] 
is  the  same  as  {A  = a,  B = b,  C = c,  D = d}  and  similarly  [X]  = {Xj  = x},  X^  = x^,  X^  = x , . .. 
Xn  = xn^‘  The  S£luare  brackets  may  be  read  as  "the  state  of"  and  thought  of  in  that  manner. 

The  brackets  then  identify  an  event  in  an  event  space  which  is  specified  by  the  contents  of  the 
brackets.  Note  the  assumed  correspondence  between  variables  with  upper-case  names  and 
variables  with  lower-case  names. 

4.1  Given  a computation  tree  whose  output  variable  is  A,  we  will  use  P{A  = a}  or  P[A|  to 
denote  the  probability  mass  function  which  is  the  desired  result  when  given  the  probability  mass 
functions  for  the  inputs.  We  will  assume  throughout  that  the  terminal  node  distributions  are 
statistically  independent.  Now,  if  the  set  of  all  nodes  in  the  tree,  excluding  the  root  A is 
denoted  by  {X(1  Xz, . . . Xk}  and  fA  is  the  function  associated  with  A,  and  A's  left  and  right 
ancestors  are  X.  and  X , respectively,  then  we  may  write  the  following  trivial  expression  for 
P{A  = a}: 

V 


(1) 


P[A]  = 


H[X1,X2,...Xk, 


x \ • X2*  * * * xk 
>fA(xi*  V = a 

This  expression  is  clearly  not  satisfactory  from  a computational  viewpoint,  although  it  is 
possible  to  utilize  it  to  compute  P{A  = a}.  The  computation  implied  is  to  form  the  k-dimensional 
probability  mass  function  by  listing  all  the  k-tuples  for  which  the  probability  is  nonzero  (by 
enun  crating  all  the  possible  combinations  of  the  values  of  the  input  variables  and  evaluating  all 
the  intermediate  results)  and  evaluating  the  probability  mass  function  as  the  product  of  the 
probabilities  of  values  of  the  independent  input  variables. 

We  may  then  always  write  trivially  a program  as  in  (1)  for  the  computation  of  the  probability 
mass  funct'on  of  the  output.  These  programs  are  enormously  expensive  to  compute  as  indicated 
above.  Thus,  we  present  a set  of  transformation  rules  which  may  be  applied  to  such  programs 
and  which  are  guaranteed  to  maintain  correct  results  for  the  computation.  Applications  of  these 
rules  may  be  used  to  transform  a program  into  a broad  variety  of  computational  variants,  some 
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of  which  should  be  less  costly  to  compute.  In  the  next  section,  we  will  demonstrate  the  technique 
on  a simple  example  program  and  in  Sec.  4.3,  we  present  formal  statements  of  the  rules  and 
prove  that  their  application  does  not  change  the  value  produced  by  the  program. 


4.2  Example 

Consider  the  computation  tree  shown  in  Fig.  rV-1  which  represents  the  computation 
Z = W2^'  fy*W2'  W3^‘  fB**VW2’  W3^’  W4^‘  We  may  write  quite  generally  that 

(2)  P[Z)  = £ PJA.B.X.  Y,  Wr  W2,  W3,  W4) 

Wj,  w2,  w3>  w4> 

x,  y,  a,  b 
af^ia.  b)  = z 


We  may,  by  the  definition  of  conditional  probability,  rewrite  this  as 


(3) 


P[Z)  = 


P[A,  n.  x,  \vr  w4|w2,  W3, 3 1 P[W2,  w3,y| 


Wj,  w2>  w 3,  w4, 
x.y,  a,  b 

af^la,  b)  = 7. 


where  we  have  chosen  to  condition  on  those  nodes  which  are  ancestors  of  both  A and  U.  In 
this  conditional  distribution,  A and  B are  now  statistically  independent,  since  we  have  fixed 
the  values  of  their  common  ancestors,  and  their  other  terminal  ancestors  are  independent. 
Thus,  we  may  write. 


H)  P[Z)  = ^ p|A,x,Wj  w2.  \v3,v]  Pin.  \v4|v\2,w3,  Y]  x 

w,,  w2,  w3,  w4.  p|W2>  W},  Y ) 

x.y.a,  b 
a f^ia,  b)  = z 


Consider  the  use  of  the  random  variable  X in  this  equation.  It  appears  only  once  and  is  summed 
over.  Since 


s 

I 

4,  *j 


i 

i 

I 


: 

: 

1 


JL 


We  may  simplify  (4)  to  obtain 


(6) 


PfZ]  = 


Y 


Wj.  w2,  w3,  w4, 

y,  a,  b 

afz(a,  b)  = z 


P[A,  Wj  ! W2,  W3.  Y]  P|B,  W4 1 W2,  W3.  Y]  X 
P[W2,W3,Y1 


We  may  perform  the  same  simplification  on  random  variables  Wj  and  W4  to  obtain 


(7) 


PfZ]  = 


Pf  A|  W7.  W3,  Y ] X P[B|  W?,  Wv  Y]  P[W7,  W,,  Y) 


w2.w3, 
,v.  a,  b 


afz(a,  b)  = z 

Note  that  in  the  original  computation  tree,  if  we  break  all  the  links  which  descend  from  Y,  that 
W3  no  longer  has  a path  to  A.  We  say  that  {y}  separates  {A}  from  {Wj}.  Thus  in  the  condi- 
tional distribution  for  A which  is  conditioned  on  W,,  W,,  and  Y,  we  may  eliminate  W„  from 
the  condition  because  it  provides  no  additional  information  to  aid  in  evaluating  the  probability 
that  A = a.  Thus,  we  have 


(8) 


PfZ]  - 


P f A | W2,  Y ] x P[R|W,,W3,  Y]  x P[W2,W3>  Y] 


w2,w3,y, 
a,  b 

3 fz(a,  b)  = z 

Note  that  {Y}  separates  { B}  from  {W2>  W3}  and  thus  we  may  write 


(9) 


PIZ1  = 


P[A|W2,  Y]  x P [ B | Y ] x PfW2.  W , Y] 


w2,w3,y, 
a,  b 

3 f^  la,  b)  = z 

W3  now  appears  only  in  the  last  factor  and  may  thus  be  removed  to  obtain: 

(10)  P[Z]  = £ P[A|W2,  Y]  x P[B|  Y]  x P[W2,  Y| 

w2,y,a,  b 
afz(a,  b)  = z 

Now  by  using  the  definition  of  conditional  probability  we  may  rewrite  this  as 

(11)  PfZ]  = £ P [A,  W2,  Y]  x P [R | Y] 

w2. y. a,  b 
3fz(a,  b)  = z 

Now  W’2  appears  only  once  and  may  be  removed. 


(12) 


PfZ]  = 


PfA,  Y]  x P[B|  Y] 


y.a,  b 

3t'z(a,  b)  = z 

Finally,  by  the  definition  of  conditional  probability,  we  have 


. 
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(13) 


pfzl  - 2 p[A|  Y)  x P[B|  Y)  x P[y] 


y.a.  b 

3fz(a,  b)  = z 


Now  the  time  to  perform  this  calculation  is  proportional  to  the  product  of  the  number  of 
nonzero  probability  values  which  are  assumed  by  y.  A.  and  13,  whereas  the  original  expression 

r7lrCd  UmG  Pr°P0rtiOnal  l°  the  0roduct  ^e  number  of  nonzero  probability  values  for 
** , 2*  W3*  W4-  Thls  assumes  a simple-minded  calculation  scheme  for  die  f-eonvolntions 

implied.  If  we  assume  that  the  number  of  nonzero  probability  values  for  W W W and 
W4  is  the  same  number  n,  and  further  assume  that  the  number  of  points  in  Y.  A.'lnd  B are 
inear  multiples  ot  then  the  second  computation  involves  n3  operations  as  opposed  to  n4. 
Whether  these  assumptions  are  met  depends  upon  the  spacing  of  the  arguments  in  relation  to 
the  t unctions  used.  If  points  are  spaced  linearly  and  the  functions  are  additive,  then  typically 
we  will  generate  fewer  than  n2  distinct  points  in  forming  the  f-convolution.  If  all  points  are 
spaced  evenly  at  the  same  intervals,  the  number  of  points  in  the  resultant  distribution  is  2 >.  n. 

4.3  We  now  state  formally  the  rules  which  we  have  used  to  simplify  the  expressions  in  the 
example  above. 

1)  Definition  cf  Conditional  Probability 

P[X,  .X]  = P[i|  Y|  X P[X| 

This  is  merely  the  usual  statement  of  the  delinition  of  conditional  probability.  Note  that  if 
i [Y|  = 0 then  the  conditional  probability  is  undefined. 

2)  Marginality  of  unique  variables 
If  we  have 

(2)  2 p[K.X|Y]  x P[VV|I]  x ... 

fa}.  k 

JB({z}) 

Where  H is  a boolean  condition  not  involving  k.  and  k appears  only  once  as  shown  then  we 
may  rewrite  as 

,3)  2 P[X|V|  x P[W|I]  x ... 

fa} 

aB({z}) 

where  the  fa}'s  represent  all  the  small  letter  variables  in  [X],  [Y],  [W],  [J.| and  thcre 

may  be  as  many  more  terms  a0  desired. 

The  rule  states  that  when  a variable  appears  only  once  and  is  summed  over,  then  we  need 

not  have  the  details  oi  its  joint  distribution  with  the  other  variables  in  order  to  perform  a 
correct  calculation. 


3)  Conditional  Independence 


Deiinition:  The  least  common  ancestor  set  of 

(1)  are  common  ancestors,  and 

(2)  has  the  prope-ty  that  any  path  from  a 
through  a node  in  the  set. 


A and  B is  the  smallest  set  of  nodes  which 
common  ancestor  of  A or  B passes 
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If  we  have 


(4)  P[A,  B,Z,X|Y] 

where  {Y}  includes  all  the  least  common  ancestors  of  A and  B,  and  {X}  contains  ancestors  of 
A only,  {Z}  contains  ancestors  of  B only,  then  we  may  rewrite  this  as 

(5)  P[A,X|Y]  x P[B.Z|Y) 

This  is  true  because  the  input  variables  are  independent  and  once  we  are  given  the  values  for 
(VI,  then  there  is  no  dependence  of  A and  B on  any  common  input  variables  and  so  they  are 
independent. 

Proof  of  Conditional  Independence 

Given  a computation  tree  as  shown  in  Fig.  IV -2,  divide  all  the  inputs  to  the  tree  into  four 
sets  depending  on  whether  they  are  ancestors  of  A,  ancestors  of  B,  ancestors  of  A and  B, 
or  ancestors  of  neither. 


The  set  of  input  variables  which  contains  ancestors  of  A only  we  will  refer  to  as  4>{X} 
since  X includes  all  the  ancestors  of  A only.  Similarly  4>{i'.}  is  the  set  of  input  variables 
which  consists  oi  all  the  ancestors  of  A and  B,  and  4>{z}  includes  all  the  ancestors  of  B only. 

The  state  of  Y.  (i.c.,  the  values  of  all  the  variables  in  Y)  is  completely  determined  by  the 
state  of  4>{y}.  We  will  write  this  rather  loosely  as 

(6)  [Y|=  fv([4-{Y}]) 

The  state  of  (ii)  is  determined  not  only  by  [ * { X}  ] but  also  by  [i]  since  {x}  may  have 
ancestors  in  {y}.  Thus 

(V)  [X]=  fx([*{X}].  [YD 

and  similarly 

(»)  (X|=  fz([YJ.  [ + {Z}|) 

Now  we  may  write  that 

(9)  P[A.  X,  B,  Y.  X]  = V P [4>f  Y>  ] x P[4-fx}]  x P[  + {z}| 

[*{V}|,  [ + {X}  ], 

Hzli 
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where  the  sum  is  carried  over  all  [*{*}].  [*{z}]  and  [*{£}]  such  that 

fA(xi,yj>  = a 
fB(yk*  zn*  = b 

[Y] =  fy([4>{Y}])  (I) 

[X]=  fx([*{2f}].[X]) 

[Z]  = fz([Y],  [*{Z}J) 

Note  that  this  statement  is  straightforward,  since  the  complete  states  of  {x} , {l},  {£},  A, 
and  B are  determined  fully  by  knowledge  of  the  states  of  *{x}.  *{y},  and  ${z}.  Further,  for 
any  state  of  «*>{  Y}  meeting  condition  (1),  the  set  of  states  of  ${£}  and  ${z}  which  also  meet 
condition  (1)  is  the  same.  This  is  because  f Y]  is  fixed  during  the  summation  and  the  choice  of 
valid  [*{X}j  and  [4>{Z}]  is  determined  by  [ Y],  [X],  [Z]  and  not  by  [*{*}].  Thus,  since  we  have 
a complete  cross  product  we  may  rewrite  (9)  as 

(10)  P[A,  X,  B,  Y,  Z]  = Yj  P[*{Y}j  x Y P[*{x}]  X P[*{2}] 

[*{Y}1  [*{x}],  [4* { Z}  ] 

where 


m = fY([4.{  Y}  ]) 
in  the  first  sum  and 

fA(xi-yj)=a 

fB(yk’  zn>  = b 

[X]  = fx([4>{X}],  [Y]) 

[Z]  = fz([Y).  [*{Z})) 

in  the  second  sum. 

Now,  we  may  also  write  that 


(II) 


(III) 


(111  P [Y]  = Y P[*{Y)l 

f*{Y}] 

where  the  [4>{y}]  satisfy  condition  (11). 

So  we  may  then  write  that 


(12)  P[A,  X,  B,  Z|  Y]  = Y PKUlXPIlfz}] 

[*{X}].  [4>{Z}  ] 

where  the  sum  is  over  all  ’ { X}  ],  [4> {z}  ] which  satisfy  condition  (III).  If  condition  (II)  is  not 
met  for  any  state  of  <1>{y}  then  the  value  for  (12)  is  undefined. 

Similarly,  we  obtain 


(13)  P[A,X|Y]=  Y P[*{X}] 

[*{X}| 

where 


fA(xi*  yj>  = a 

[X]=  fx([*{x)l.  [Y])  (IV) 
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or  undefined  if  there  is  no  [4«{  Y>  ] sueh  that  [V|  = fv([4>{  Y}  I).  Also, 

* 

<«>  P[B,Z|Y]=  £ P[4.{z}] 

where 

fB(yk*  zn>  = b 

III  = fz([Vl,  [*{Z}])  (V) 

and  undefined  if  there  is  no  [<I>{y}]  sueh  that  [Y1  = f Y([4> {x}]). 

Now  to  complete  the  proof  we  need  to  show  that 

(15)  P[A,X,  B,Z|Y)  = P[A.X|Y]  x P[B,Z|Y) 

When  there  is  no  [*{*}]  such  that  [Y]  = fy([*{  Y}  ]?.  then  both  sides  are  simultaneously  undefined. 
Otherwise,  we  wish  to  show  that  for  any  fixed  [ Y ),  [X],  and  [Z] 

(16>  E p[*{S}l  x P[*{Z}]  = £ P [4>{X}  1 x J P[4-{Z}) 

[ 4> { X}  1 [4« { Z}  ] 

where  in  the  first  sum,  condition  (HI)  must  b<  met,  in  the  second  sum  condition  (IV)  must  be 
met,  and  in  the  third  sum  condition  (V)  must  be  met. 

Now,  for  any  [*{x}]  which  satisfies  condition  (IV),  the  set  of  states  for  f-fe}  whieh  satisfies 
condition  (III)  is  independent  of  the  choice  of  [*{x}|.  Again.since  we  have  a complete  cross  prod- 
uct,the  factoring  of  the  sum  is  correct.  Thus,  (If  ; is  an  identity. 

{ 

4)  Separating  Set  Rule 
If  we  have 

(17)  P(X|  Y,  W]  x P[Z,  Y,  W] 

where  {Y>  separates  {X}  from  fW}  (i.e.,  any  path  from  W.  to  X includes  at  least  one  Y ) and 
{X}  has  no  ancestors  in  common  with  {w}  not  already  included  in  {W},  and  {Z}  is  disjoint  from 
(Jl)  and  {W},  then  we  may  rewrite  as 

<18)  P[X|  Y]  x P[Z,  Y.  W| 

This  is  true  since  [Y]  contains  all  the  information  in  [&’]  which  is  used  in  calculating  [X],  so 
that  the  additional  information  embodied  in  [W]  cannot  change  the  probability  of  [X]. 

Proof  of  separating  set  rule  (see  Fig.  IV-3) 

As  in  the  proof  ot  conditional  independence,  we  have  sets  4>{X},  >1>{y},  and  4>{w}  whieh  are 
the  set  of  input  variables  whieh  feed  the  sets  {X},  {y},  and  {w},  respectively.  To  be  more 
precise,  <1>{W}  includes  all  those  input  variables  which  are  ancestors  of  elements  in  {W}.  4>{y} 

includes  all  those  input  variables  whieh  arc  ancestors  of  elements  in  {Y}  and  not  included  in 
similarly  includes  all  those  input  variables  which  are  ancestors  of  elements  in 
{X}  and  are  not  included  in  either  4>{w}  or  <f>{  Y}.  Then  we  may  write 

fW|  = fw([4>{  W}  )> 

[II  = fy([W],  [*!>{  Y>  ]) 

[A|  = fx([YJ,  [*{X}|) 
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Xiy  assumpt.on,  the  sets  4>fX}.  4.{y},  and  *{W}  are  disjoint  sets  of  statistically  independent 
variables  and  so  we  may  write 


(19) 


P [X,  Y.  W] 


V 

LJ 


(f{x}  1.  f*{ y}  1, 

[4»fW}] 


P[4.fX}]  x P [ <i>  { Y } ] x P [4>{  W}  ] 


where  the  sum  is  over  all  [4>{X}].  [<t{ V}  ] and  [4>{^}]  such  that 

[W]  - fw([*{ffi}]) 

m = rY([wt,  i*fx}]) 

[X]  = fx([V).  [4>{x}  ]) 


This  sum  may  be  written  as 


(20)  P[X,Y.W]=  V P[4>{X}lx  V P[*{Y}]  X V P[4>{  W>  ) 

[*{x}]  [«*»{  Y>  ] [4>{W}  ] 

sinee  for  any  [4>{x}]  meeting  condition  (VI),  the  sets  of  states  of  4>{y}  and  <t>{\y}  meeting  condition 
(VI)  do  not  ehange  and  similarly  for  4>{x}  and  Similarly, 

(21)  P [ Y,  W]  = V P(4»{  Y}  ] x Y P[<J>{w}] 

[>*»{  Y>  ] [4>{W>  ] 

where 

[W]  = fw((4>{w}])  I 

[Y]  = fy((W],  [*{Y}])  j (VII) 

Thus, 

(22)  P[X|Y.JS(]=  7 P[*{X}] 

(4-{X}] 
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where  for  some  [4>{y}  J and  [*{»}).  condition  (VII)  is  met  and  [X]  = [*{2}])  and  undefined 

if  condition  (VII)  is  not  met  for  some  [4>{y}  ) and  [*{W}].  Now 

(23)  P[X,  Y)=  2 P[*{x})x  Yj  P [4»{  V}  ] x P[$] 

[*{X}J  [*{*}].  [W] 

where 

m = fY([W],  [*(y}j)  1 

[X]  = fx([Y),  [*{X}»  J <VIII> 

and 

(24)  P[Y)=  J P [4>{ X}  ) x P ] 

[*{Y}],  [W] 

where 

fY)  = fY([W),  [+{y}D  (IX) 

So 

(25)  P[X|  Y]  = £ P[4>{x}] 

[*{x)l 

where 

[X]=  fx([YJ,  (*{2£}J)  (X) 

and  undefined  if  there  is  no  [4>{x}]  such  that  (IX)  is  met. 

This  is  identical  with  (22)  except  in  the  case  that  P[X,W]  = 0 and  this  case  is  irrelevant 
since  both  sides  are  multiplied  by  P [Z,  X.  W). 

5)  Elimination  of  Complicating  Variables 
If  we  have 

(26)  P[X|Y,.W]  X P[X,  W,Z] 

where  all  the  ancestors  of  {y}  are  contained  in  {w},  and  {z}  is  disjoint  from  {y}  and  {$},  and 
{X},  { Y}  and  {W}  are  disjoint,  then  we  may  rewrite  as  follows. 

(27)  P[X|W]  x P[Y.W,Z] 

Proof  (see  Fig.  IV-4) 

Let  4>{\v}  be  the  set  of  all  input  variables  which  are  ancestors  of  nodes  in  {w},  and  let 
4>{x}  be  the  set  of  inpi  r v;i»-iables  which  are  ancestors  of  nodes  in  {x}  and  not  already  included 
in  4>{w}.  Then  we  have 

[W]=  fw([*{w}l) 

[Y]  = fY([W]) 

LX)  = fx([W],  [Y],  [*{x}]) 
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So  we  have  trivially  that 

(28)  P(X,Y.W]=  J P[+{W}]  X P[4>{2}] 

[+{»}].  [*{x}] 

where 

[W] =  fw(f+{w})) 

[X]  = fx([W],  m,  [4>{2C}p 

We  may  also  write 

(29)  P[Y.W]=  V P[4,{^}) 

(+{W}] 

where 

[W)=  fw([*{w}]) 

[Y]  = fY((W|) 

So  we  have  that 


(XI) 


(XII) 


(30) 


P(X|  Y.  W]  = 


2 P[4>{  W)  ] x P [ 4>  { X}  ] 


P[${wjj 


undefined  if  there  is  no  *{w}  such  that  (XII)  is  met.  Now,  we  may  similarly  write  that 


(3D  P[X,W]=  J P[4>{w})  x P[4>{x}] 

[4>  {X} ) 

where 


[W] =  fw((4>{w}]) 

(X]  = fx((W],  fY([W]),  [*{x}]) 

and 


(XIII) 


(32) 


P(W]=  V P[4.(W}] 


where 


[W]  fw([*{w}j) 


(XIV) 


Thus,  we  obtain 


(32) 


P[X|W]  = 


P[*{W}]  x P[*{X}] 


s 

[4»{W>  ] 


P [4>{  W} ) 


or  undefined  if  there  is  no  4>{w}  satisfying  condition  (XIV).  Comparing  this  to  (30),  we  find 
the  expressions  are  identical,  except  when  P[Y,  W)  is  zero.  But  in  that  case  the  values  are 
irrelevant  since  both  will  be  multiplied  by  zero. 

4.4  We  now  indicate  that  the  transformations  indicated  in  the  example  are  all  instances  of 
applications  of  the  rules  just  presented.  Referring  back  to  Sec.  IV -2,  we  find  that  we  move  from 
(2)  to  (3)  by  the  application  of  Rule  1 with  X = {A,  B,  X,  WJ(  W4}  and  Y = { W W , Y}.  We  move 
from  (3)  to  (4)  by  Rule  3 with  X = {X,  W^,  Z = {W4},  Y = {w.,,  W3>  Y}.  We  move  from  (4)  to 
(7)  by  three  applications  of  Rule  2 to  the  variables  X,  W},  and  W4.  This  can  be  pursued  simi- 
larly throughout  the  example. 


Conclusion 

The  transformation  rules  described  above  must  be  applied  in  some  sequence  in  order  to 
simplify  the  statistical  compilation.  The  sequence  of  application  is  in  turn  determined  by  the 
cost  of  various  possible  statistical  compilation. 

In  the  next  section,  we  will  discuss  the  various  representation  choices  which  may  be  em- 
ployed by  the  implementor  of  a statistical  computer.  The  implementor  will  have  to  decide  which 
of  these  transformations  should  be  used  to  improve  the  running  time  of  the  programs  based  on 
the  different  representations  employed. 
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V.  REPRESENTATION  TECHNIQUES 


A key  issue  which  will  faee  the  implementor  of  any  statistical  compiler  is  how  to  represent 
the  random  variables  in  his  implementation.  This  section  is  a survey  of  possible  answers  to 
that  question  and  is  not  intended  to  be  complete;  rather,  it  tries  to  give  an  indication  of  the 
range  of  possible  choices  and  some  of  the  effects  of  these  choices  on  the  efficiency  and  accuracy 
of  the  computations. 

In  choosing  a representation  for  random  variables,  two  key  design  parameters  must  be 
explored.  The  first  of  these  is  representational  freedom,  i.e.,  what  range  of  distributions 
functions  can  be  described  accurately?  For  example,  if  a user  arrives  with  an  empirically 
determined  distribution,  how  accurately  can  this  distribution  be  described  to  the  system?  As 
the  representational  freedom  is  narrowed,  fewer  parameters  are  needed  to  select  a particular 
distribution  and  storage  space  is  economized.  Moreover,  certain  calculations  on  th ' random 
variables  represented  by  these  distributions  also  become  simpler.  For  example,  if  the  chosen 
representation  is  by  gaussian  distributions,  then  addition  of  the  random  variables  is  simple  to 
perform,  but  very  few  distributions  can  be  accurately  portrayed. 

Another  design  parameter  is  the  class  of  base  functions  (referred  to  in  Sec.  II  as  If)  which 
determines  the  class  of  computation  trees  n that  the  user  may  construct.  As  we  indicated  in 
Sec.  II,  in  order  to  include  a function  f in  ,‘t,  we  must  know  computational  techniques  for  eval- 
uating the  representation  ->f  f(X,  Y)  where  X and  Y are  random  variables  with  representations 
from  the  chosen  representation  class.  Moreover,  the  techniques  must  extend  to  handle  the 
cases  where  X and  Y are  not  independent  but  rather  a joint  distribution  is  represented.  We 
refer  to  this  class  of  computations  throughout  this  report  as  f-convolution  because  of  its  simi- 
larity to  the  usual  convolution  operation.  Indeed,  when  the  representation  of  distributions  is  in 
terms  of  their  probability  density  functions,  • + ' - convolution  of  independent  random  variables 
is  precisely  the  standard  convolution  operation  while  - convolution  is  better  known  as 
correlation. 

The  actual  arithmetic  operations  which  are  performed  in  the  computer  to  calculate  an  f- 
convolution  are  strongly  affected  by  the  chosen  representation  and  may  be  either  simple  or  dif- 
ficult for  a particular  f depending  on  the  representation.  Moreover,  given  a particular  class 
of  operations,  the  class  may  not  be  closed  under  our  f-convolution  operation.  Thus  if  Rj.  and 
Ry  are  the  representations  of  the  random  variables  X and  Y,  the  calculation  of  the  representa- 
tion for  f(X,  Y)  involves  two  steps:  (1)  calculation  of  an  exact  result  for  f(X+,  Y’:'),  where  X*  is 
the  variable  exactly  represented  by  R^,,  and  (2)  selection  of  a representative  from  the  represen- 
tation class  for  Rj.^  Y)-  Ltoth  steps  represent  potential  sources  of  errors  which  must  be  in- 
cluded in  an  analysis  of  the  accuracy  of  the  results  of  choosing  a representation,  although  for 
some  representations,  which  are  closed  under  f-convolution,  the  second  step  is  not  an  issue. 

For  example,  if  we  choose  addition  as  a member  of  the  class  of  base  functions,  then  it  is 
unreasonable  to  choose  a representation  which  is  limited  to  uniform  distributions.  The  repre- 
sentation of  the  sum  of  two  independent  random  variables,  each  of  which  is  represented  by  a 
uniform  distribution,  is  never  a uniform  (except  if  one  distribution  is  the  degenerate  case  of  a 
point  distribution),  and  a uniform  approximation  would  entail  a significant  loss  of  accuracy.  See 
Fig.  V-l  for  an  example  of  this. 

Statisticians  have,  of  course,  spent  much  time  and  effort  in  developing  techniques  to  de- 
scribe distributions.  They  have  not,  as  a rule,  concerned  themselves  with  representations 
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which  are  convenient  for  f-convolution  operations  as  necessary  here.  It  is  our  intent  to  briefly 
indicate  some  of  the  statistical  approaches  to  the  representation  problem  anil  the  characteristics 
of  these  representations  for  statistical  compilers. 

We  will  assume,  except  where  explicitly  stated  otherwise,  that  the  computer  representation 
of  REALs  is  exact.  While  this  is,  of  course,  not  true,  it  is  a much  smaller  source  of  error  in 
an  implementation  than  the  other  errors  discussed  in  this  section.  Moreover,  extended  preci- 
sion arithmetic  can  be  used  to  correct  this  error  in  well-known  ways,  whereas  the  other  error 
sources  are  not  as  easily  controlled. 

Random  Variables 

We  begin  by  reviewing  what  is  meant  by  a random  variable.  A typical  definition  may  be 
found  in  Teller  | Feller  57].  We  must  first  have  a probability  space  which  is  constructed  of 
elements.  These  elements  are  the  basis  for  an  clgebra  of  sets  (a  ir-algcbra)  and  a probability 
measure  P on  those  sets.  Then  wc  have 

Definition:  A random  variable  is  a function  p on  a probability  space 
such  that  for  each  real  t the  set  of  points  x where  p(x)  =*  t belongs  to 
the  underlying  u-algebra. 

The  definition  basically  insures  that  any  real  function  which  provides  a well-defined  distribution 
function  is  a random  variable.  That  is,  the  function  F(t)  P|p(x)  t)  is  guaranteed  to  be  well- 
defined  by  the  definition  of  random  variable. 

Despite  the  generality  of  the  definition,  a number  of  properties  of  distribution  functions  can 
be  directly  determined  from  the  properties  of  the  e-algebra  For  example,  the  Jordan  and 
Lcbesgue  decomposition  theorems  state  that  any  distribution  function  I can  be  expressed  as  a 
mixture  of  three  types  of  distribution  functions. 

1 P'aC  + ^'S  + rlA 

where  p 5 0,  q > 0,  r > 0,  and  p + q + r - 1.  F^(  is  absolutely  continuous,  F<,  is  continuous 

but  singular  (i.e.,  concentrated  on  a set  of  measure  0),  and  F^  is  atomic  (i.e.,  concentrated  on 
the  set  of  its  atoms,  where  atoms  arc  single  points  with  a positive  probability  weight). 
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For  our  work,  we  will  assume  that  = 0;  that  is,  our  distribution  functions  do  not  include 
components  whici'  are  continuous  on  sir0uiar  sets,  We  regard  these  as  curiosities  not  likely 
to  be  observed  in  general  practice. 

We  will  look  separately  at  absolutely  continuous  distribution  functions  and  atomic  distribu- 
tion functions  and  the  representations  appropriate  to  each  type.  We  can  then  explicitly  repre- 
sent any  more  general  distributions  as  a mixture  of  the  two. 

Absolutely  Continuous  Distributions 

We  will  start  with  the  absolutely  continuous  case.  If  F is  a distribution  function  which  is 
absolutely  continuous,  then  chere  is  an  associated  probability  density  function  (pdf)  f which 
exists  almost  everywhere  such  that 

F(x)  = f f(u)  du 

^ — OO 

NOTE:  We  use  upper  case  letters  to  denote  distribution  functions,  and  lower 
case  letters  to  represent  density  functions. 

Thus,  the  two  obvious  possibilities  are  to  represent  F directly  or  to  represent  f.  Since 
techniques  for  calculating  with  probability  density  functions  are  better  known  than  techniques 
for  calculating  with  distribution  functions,  we  next  turn  our  attention  to  representations  suitable 
for  density  functions. 

Probability  Density  Functions 

The  problem  of  representing  a pdf  is  very  much  the  general  problem  of  approximating  a 
function  of  a real  value.  In  fact,  the  problem  is  complicated  by  the  fact  that  many  pdfs  are  not 
even  continuous  (e.g.,  uniform  distributions),  much  less  differentiable.  Given  such  an  approxi- 
mation problem,  the  analyst  must  first  choose  i->e  "form  and  norm"  [Rice  64]  to  be  used.  The 
"form"  refers  to  the  cUiSS  of  approximating  functions  from  which  the  actual  approximation  will 
be  chosen;  the  "norm"  refers  to  the  error  measure  (i.e.,  least  squares,  least  maximum  error, 
etc.)  to  be  used  in  selecting  a particular  function  of  the  chosen  form  to  be  the  approximation. 
Once  these  choices  have  been  made,  we  can  then  proceed  to  comment  on  the  existence  and 
uniqueness  of  the  solution  to  the  approximating  problem. 

Our  criteria  for  selection  of  an  appropriate  form  is  strongly  colored  by  our  usage  of  these 
approximations  for  the  f-convolution  operations.  Thus,  our  emphasis  is  significantly  different 
from  that  in  texts  on  function  approximation.  We  will  emphasize  the  form  of  our  representations 
and  the  costs  of  computing  with  those  forms  more  than  we  will  discuss  norms  and  the  impact  of 
the  norms  on  the  choice  of  a representative  from  the  class  of  approximating  functions.  The 
reason  for  this  choice  is  simply  that  the  impact  of  the  norms  is  no  different  for  our  problems 
than  for  others  and  has  already  been  extensively  considered  in  the  literature.  On  the  other 
hand,  the  efficiency  and  accuracy  issues  for  calculating  f-convolutions  depend  critically  on  the 
form  of  the  approximating  function  and  have  not  been  previously  considered  in  the  literature  in 
a unified  way. 

Representation  by  Sampling  Techniques 

The  most  obvious  representation  is  merely  to  choose  evenly  spaced  points  along  the  pdf  and 
tabulate  their  values.  Depending  upon  the  techniques  used  to  interpolate  values  for  intermediate 
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points,  a range  of  representations  is  defined.  For  example,  if  the  value  of  a pdf  p at  a point  x 
is  chosen  as  the  value  at  the  nearest  tabulated  point,  then  we  are  approximating  p by  a mixture 
of  uniform  distributions  of  a constant  width. 

This  class  of  approximations  is  interesting  because  it  provides  representations  which  are 
easy  to  use  to  compute  f-convolutions.  In  particular,  if  we  are  interested  in  the  base  operations 
of  addition  and  subtraction,  then  the  corresponding  f-convolutions  are  convolution  and  correla- 
tion. The  recent  work  with  Fast  Fourier  Transforms  has  led  to  rapid  techniques  for  calculating 
these  results  [Stockham  69].  In  the  next  section,  we  will  describe  briefly  the  basic  computa- 
tional technique  and  in  the  following  section  indicate  the  major  error  sources. 

Convolution  by  FFT  Techniques 

The  Fast  Fourier  Transform  algorithms  are  actually  a set  of  algorithms  which  can  compute 
the  Discrete  Fourier  Transform  of  a series  of  N data  points  in  time  T = kN  log2  N.  These 
algorithms  became  well-known  following  the  publication  of  a paper  by  Cooley  and  Tukey  in  1965 
[Cooley  65],  although  the  ideas  can  be  traced  back  to  Runge  in  the  early  1900's  (see  [Cooley  67]). 
Recent  hardware  efforts  have  reduced  k to  values  as  small  as  500  ns  using  special-purpose 
hardware  [Allen  75],  while  values  of  60  ps  have  been  reported  on  machines  such  as  the  IBM  7094 
[Stockham  66  ]. 

As  first  described  by  Stockham  [Stockham  66],  the  FFT  algorithms  can  be  used  to  signifi- 
cantly speed  the  calculation  of  convolutions  and  correlations.  The  technique  depends  on  the  fact 
that  the  product  of  the  Discrete  Fourier  Transform  (DFT)  of  any  two  sequences  of  points  is  equal 
to  the  DFT  of  the  circular  convolution  of  the  two  sequences.  To  obtain  ordinary  convolution,  one 
must  pad  the  desired  points  with  a sufficient  number  of  zeros  so  that  the  circularity  is  irrele- 
vant. The  DFT  and  inverse  DFT  can  both  be  calculated  by  FFT  techniques  resulting  in  a signi- 
ficant time  savings  compared  to  standard  techniques  if  the  number  of  points  involved  is  medium 
sized  or  larger.  Stockham's  data  show  the  crossover  point  between  N = 24  and  N = S2. 

The  DFT  is  defined  as  the  discrete  analog  of  the  Fourier  transform,  namely 

N-l 

F(t)  = £ f(k)wkt,  t = 0,1,...,  N-l.  w = e2?ri/N  . 

k=0 

If  we  are  given  two  sequences  of  points,  f and  g (we  will  assume  f and  g are  both  sequences 
of  N points),  then  their  circular  convolution  is  defined  as 

N-l 

hc(l)  = ^ f(n)  g((l  - n)  mod  N) 

n=0 

It  may  be  shown  [Gold  69]  that  if  we  calculate  the  DFTs  of  f and  g,  and  multiply  these  and 

apply  the  inverse  DFT,  the  result  is  precisely  h^.  While  this  is  a complicated  way  to  perform 

a simple  calculation,  there  is  a significant  speed  advantage  which  justifies  the  effort.  The 

direct  way  of  calculating  h would  require  N multiplications  and  N-l  additions  for  each  value 

c 2 

of  1.  This  gives  a total  time  of  k^N  where  kc  is  the  time  to  perform  one  multiply/add  and 
associated  bookkeeping.  Calculation  of  the  DFT,  however,  requires  k^.N  log2  N time  as  docs 
calculation  of  the  inverse  DFT.  This  yields  a total  time  of  3kfN(log2  N + \ ) which  is  smaller 
for  large  N.  (These  calculations  arc  for  N a power  of  2;  similar  savings  hold  for  other  Ns.) 
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I*  or  work  with  probability  distributions,  we  are  not  interested  in  circular  convolutions,  but 
rather  in  "aperiodic"  convolutions  of  the  form 

N-l 

h(l)  = Yj  f(n)g(l-n)  . 
n=0 

One  way  to  achieve  this  effect  using  the  DFT,  is  to  use  a larger  vaiue  of  N.  That  is,  if  f 
and  g are  given  at  M points,  then  assume  their  value  is  zero  outside  of  that  range.  Extend  the 
sequences  with  enough  zeros  (M  is  always  enough),  and  perform  a circular  convolution  of  these 
extended  sequences.  A predictable  portion  of  the  resulting  circular  convolution  is  then  the  de- 
sired aperiodic  convolution  (see  (Stockham  69]  for  more  details). 

Error  Analysis 

Since  we  have  restricted  ourselves  to  absolutely  continuous  distribution  functions  in  this 
section,  their  pdfs  exist  almost  everywhere.  We  will  further  assume  that  our  pdfs  are  bounded 
on  (-«,  •).  Then  we  may  conclude  that  the  convolution  of  our  pdfs  exists  and  is  also  bounded 
on  (-«,  »)  [Apostol  57].  Let  us  examine  now  more  carefully  the  implications  of  calculating 
convolutions  by  FFT  techniques  when  we  are  talking  about  absolutely  continuous  distribution 
functions. 

The  first  thing  we  must  observe  is  that  the  use  of  FFT  techniques  in  the  calculation  does 
not  change  the  error  characteristics  significantly.  That  is,  once  we  have  sampled  our  two  in- 
put pdfs  f and  g at  some  chosen  points,  whether  we  calculate  the  convolution  of  these  sequences 
of  points  by  FFT  techniques  or  through  direct  inner  product  evaluations,  the  results  will  be 
numerically  identical  (remember  that  we  made  an  assumption  that  arbitrary  precision  arithmetic 
is  available;  in  fact,  the  round-off  error  propagation  of  the  FFT  implementations  has  been 
shown  by  experience  to  be  as  good  or  better  than  that  obtainable  by  summing  products 
[Stockham  66].  Further,  by  careful  choice  of  quantization  of  the  probability  values,  and  number 
of  samples,  one  can  take  advantage  of  FFT-related  techniques  such  as  the  Fermat  Number 
Transform  to  yield  no  round-off  errors  and  further  improved  speed;!  [Agarwal  75]). 

The  important  errors  in  the  technique  arise  rather  from  the  sampling  process  on  the  input 
pdfs.  Let  us  try  to  bound  these  errors  more  carefully.  If  wc  wish  to  calculate  the  convolution 
of  two  pdfs  f and  g,  then  we  wish  to  evaluate 

h(x)  = f f(y)  g (x  -y)  dy 
*--00 

and  this  exists  if  f and  g arc  absolutely  intcgrable  (true  for  all  pdfs)  and  bounded  on  (-«=,  «). 

The  first  thing  we  must  do  is  choose  an  upper  and  lower  bound  for  the  domain  of  each  of 
the  functions  f and  g which  repre0ent  the  limits  of  the  sampled  versions  of  these  functions.  If 
wc  choose  limits  for  f only,  and  these  limits  are  [a,  b],  resulting  in  ignoring  a weight  of  ef  in 
the  tails  of  f,  then  in  fact  we  are  evaluating 

hf(x)  = f f(y)  g(x  - y)  dy 
*-a 

The  error  caused  by  this  can  be  directly  calculated 
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h(x)  - hf(x)  = J3  f(y)  g(x  - y)  dy  + ^ f(y)  g(x  - y)  dy  . 

Sinee  both  integrals  are  positive  (f  and  g are  always  positive),  we  have 

J |h(x)  - hf(x)|  dx  = j*  [y  f(y)g(x-y)dy+ykf(y)g(x-y)dyjdx 

If  f(y)  g(x  — y)  is  a continuous  function  of  x and  y,  then  we  can  interchange  the  order  of  integration 
to  obtain 


p"  - pa  p00  p°° 

J |h(x)  - hf(x)  | dx  = j f(y)  J g(x-y)dxdy+J  f (y ) J 

Since  g is  a pdf,  J_x  g(x  — y)  dx  = 1 for  any  y.  So  we  have 

C |h(x)  - h.(x)|  dy  = f f(y)  dy  + f f(y)  dy 


g(x  -y)  dx  dy 


= ef  . 

Thus,  the  error  in  the  convolution  result  that  we  make  by  assuming  one  pdf  has  nr,  tails  is  given 
precisely  by  the  weight  in  those  tails  when  the  error  measure  is  an  Lj  norm.'' 

If  we  also  use  a tail-limited  version  of  g (call  it  g ),  then  the  error  we  make  is  similar, 
that  is 


f)  |hf(x) -hfg(x)|  dx  = eg 

where  eg  is  the  weight  in  the  ignored  tails  of  g.  Then,  by  the  triangle  inequality,  the  total 
error  between  h(x)  and  hfg(x)  is  bounded  by  + <rg,  the  sum  of  the  weights  of  the  ignored  tails. 

Once  we  have  chosen  an  upper  and  lower  bound,  we  must  next  choose  a sampling  interval 
which  is  the  same  for  f and  g if  we  use  Fl'T  techniques.  This  sampling  interval  is  then  used 
as  the  sampling  interval  for  h^.g.  Thus,  we  have 

pb 

hj.g(e  + kAt)  = \ f(y)  g(c  + kAt  -y)  dy 

are  the  exact  values  for  the  sampled  and  tail-limited  convolution  we  are  seeking.  Because  we 
are  sampling  the  input  with  the  same  At  as  the  desired  output  and  then  calculating  the  convolution 
of  two  sequences  of  points,  we  actually  calculate 

(b-al/At-i 

b *g(c  + kAt)  = | ^ f(a  + 1 At ) g(e  + kAt  - (a  + 1 At ) ) I At 

1 -0  J 


* The  L-n  family  of  norms  is  often  used  as  measures  of  distance  between  functions.  They  are 
defined  by: 


Ln(f,g)  = [j  |f(x) -g(x)|n  dxj 


1/n 
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and  tne  error  between  the  discrete  sum  and  the  integral  is  given  by 


hfg(c  + kAt)  -h*g(c  + kAt)  = y f(y)  g(c  + kAt  - y)  dy 

(b-a)/At-l 

- At  Yt  f(a  + lAt)  g(c  + kAt  ~ (a  + 1 At ) ) 

1 = 0 

This  difference  is  precisely  the  difference  between  the  Riemann  sum  representation  of  the  inte- 
gral and  the  integral  itself.  The  absolute  value  of  this  error  is  known  | Davis  75]  to  be  < (b  - a)  x 
u(At),  where 

u(At)  = max  | f(Xj ) g(c  + kAt  - x( ) - f(x2)  g(c  + kAt  - x2)  | 

| Xj  -x2  |<At 

when  f(x)  g(c  + kAt  — x)  is  continuous  in  x.  If  f(x)  g(c  + kAt  — x)  exists  for  all  k and  is 
bounded,  then  the  error  at  each  point  in  the  resulting  convolution  reduces  proportionally  to  At. 

If  instead  of  choosing  f and  g values  at  points  a + l.At,  we  choose  the  values  at  (a  + lAt  + At/2), 
and  w^fx)  = f (x)  g(c  4 kAt  — x)  has  continuous  second  nerivative,  then  the  error  at  each  k is 
given  by 

(b  - a)  (At)2  d wk(t’k) 
dx2 

where  a < 6^  < b.  The  total  error  for  the  tail-limited  cm..,  olution  is  then 


2(b-a)/At 

|hfg(c  + kAt)  — h*g(c  4-  kAt) | 
k=t 


2(b-a)/At 

V 

u 

k=l 


(b  - a)  (At) 


2 dS<(V 

dx2 


/24 


If  we  let 


A = max 


d Wk(t;k> 

dx2 


then 


2(b-a)/At 

y 

k=t 


|hj.^(c  + kAt)  - h|g(c  4-  kAt ) j •$  [ (b  - a)2  AAt]/l2 


Thus,  the  total  error  in  the  result  is  bounded  by  the  sum  of  the  errors  caused  by  removing  the 
tails,  and  the  errors  caused  by  sampling  in  the  intervals  that  remain. 


I h(x)  - h|g(x)  | 


dx  < e.  + c 
f g 


(b  - a)  A At 
12 


In  ordi  r to  derive  this  bound,  wc  needed  one  assumption  in  addition  to  those  constraints  on 
the  pdfs  we  had  at  the  beginning  of  the  section,  namely  that 
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“ 

5-  f(x)  g(C  h kAt  - x) 

dx 

exists  and  is  continuous.  This  is  a fairly  strong  assumption  for  probability  distribution  func- 
tions, but  I have  not  been  able  to  obtain  any  bounds  under  weaker  assumptions. 

Correlation 

The  preceding  development  was  expressed  in  terms  of  convolution.  In  fact,  the  computing 
techniques  and  error  bounds  for  correlation  tire  basically  identical.  Recognizing  that  X - Y is 
identical  to  X + (— Y),  we  need  merely  look  at  what  is  required  to  represent  the  negation  of  a 
random  variable  expressed  in  sampled  form.  The  negation  operation  implies  negating  and  re- 
versing the  upper  and  lower  bounds,  and  taking  the  sample  points  in  the  reverse  order.  The 
sampling  interval  remains  constant.  If  we  go  through  the  error  bounds  section  consistently 
replacing  g(c  t kAt  - x)  with  g(c  + kAt  + s ),  then  the  error  bounds  themselves  are  not  affected 
and  the  derived  results  all  apply  to  correlation,  when  computed  in  this  fashion. 

* -Convolution 

When  the  desired  base  operation  is  multiplication,  we  can  also  try  to  fake  advantage  of  FFT 
techniques  to  perform  the  calculation.  To  do  this,  wc  note  the  identity 


Y * Z = exp  1 In  Y t In  Z) 

Thus,  by  performing  three  unary  operations,  i.e.,  operations  on  one  distribution,  and  a 
^-convolution,  we  obtain  a ^-convolution.  Note  that  if  X In  Y,  then  we  have  (assuming  Y > 0 
to  avoid  difficulties  and  also  for  some  k,  J"  tkfy(t)  dt  exists  and  Z meets  the  same  conditions) 

fx(x)  = cxfy(ex) 


[Parzen  6 0 1. 

If  we  calculate  the  Fourier  transform  of  f^  in  terms  of  fy , we  obtain  the  following  expression: 


f*(s) 


, . isx  . 
.(x)  e dx 


■£<* 

- j exfy(cx)  elsx  dx 

ex(is4l)  fy(ex)d» 


let  t = e , we  have 


- f cas+1)  f (t ) ^ 
Jq  * 1 

f£(s)  J tlsfy(t)  dt  . 


Checking  in  our  tables  (Bateman  54)  we  find  that,  with  a change  of  variables  p 1 - is,  we  have 
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fyV)  = f£(l  - iS) 

where  f^1  is  the  Mellin  transform  of  f,r  Thus,  if  we  take  the  Mellin  transform  of  the  distribu- 
tions of  our  two  random  variables,  multiply  these  transformed  distributions  and  ealeulate  the 
inverse  transform,  we  find  that  we  have  performed  a * -convolution  (see  [Ditkin  65]  for  more  on 
the  properties  of  the  Mellin  transform). 

This,  of  course,  is  only  useful  if  these  transforms  ean  be  performed  effieiently,  using 
FFT-type  techniques.  Of  course,  we  have  just  seen  the  necessary  triek,  i.e.,  ealeulate  the 
pdfs  corresponding  to  In  Y and  In  Z and  take  their  Fourier  transform.  However,  the  aceuracy 
begins  to  degrade  because  a resampling  must  oceur  to  keep  the  sample  points  evenly  spaeed 
be  h before  and  after  tl In  operation.  I do  not  know  if  FFT-type  techniques  ean  be  direetly 
employed  on  t lie  original  pdf  to  ealeulate  its  Mellin  transform  and  its  inverse  directly. 

Other  f-Convolution  Operations 

In  general,  if  h-convolution  for  an  arbitrary  base  function  h is  desired,  we  have  the  follow- 
ing equation: 


F(t)  = j...j  f(x1...xn)dx1...dxn 
3h(xt.  . . xn) « t 

where  f is  the  point  distribution  function  of  Xj.  . . Xn-  This  multiple  integral  can  be  direetly 
attacked  by  numerical  integration  techniques,  if  necessary  (see  [Stroud  71)  for  a survey  of 
techniques).  These,  however,  are  likely  to  be  more  expensive  and  less  aeeurate  in  computation 
than  schemes  tuned  to  a particular  f-convolution. 

Other  pdf  Representations 

There  are  a large  number  of  other  ways  to  represent  pdfs  in  addition  to  the  sampling  tech- 
niques described  above.  These  divide  into  two  major  elasses:  (1)  representation  by  functions 
of  a Special  class,  and  (2)  representation  by  series  expansion.  We  discuss  below  an  example 
of  each  type.  These  particular  ehoices  are  well  known  in  the  statistical  literature,  but  many 
examples  of  each  type  ean  be  found,  eaeh  with  its  own  advantages  and  disadvantages. 

One  well-known  speeial  class  of  pdfs  which  can  assume  a broad  variety  of  shapes  and  is 
thus  useful  for  approximation  is  the  Pearson  family  of  pdfs  [Kendall  63].  These  pdfs  can  all 
be  characterized  by  the  following  differential  equation,  where  f is  the  pdf 

df  _ (x  - a)  f 
dx  . , . T T 

bp  + bjX  + b^x 

It  is  simple  to  deri  e from  this  equation  a number  of  properties  of  f. 

(1)  df/dx  vanishes  at  the  point  x = a,  and  only  at  that  point.  Thus  these 
distributions  arc  uni-modal,  but  in  speeial  cases,  they  may  be  ,J- 
shaped  or  U-shaped. 

(2)  There  is  (except  for  special  cases)  smooth  contact  with  the  x-axis  at 
the  extremities,  so  that  df/dx  vanishes  when  f = 0. 


(3)  The  parameters  a,  bQ,  t>1(  and  b2  can  all  be  determined  from  the 
first  four  moments. 

(4)  The  Normal,  Beta,  Chi-square,  Student's  t-,  and  Gamma  pdfs  are 
all  special  cases  of  the  Pearson  family. 

Since  a Pearson  pdf  can  be  detei  mined  from  its  first  four  moments,  tt  .*  relations  between 
the  defining  parameters  and  the  first  four  moments  can  be  used  to  move  from  one  to  the  other. 
That  is.  given  the  first  four  moments,  one  ean  determine  the  parameters  a,  bQ,  b^,  and  b^  de- 
fining the  Pearson  pdf,  and  conversely,  given  a,  bQ,  bj,  and  b2>  one  can  calculate  the  first 
four  moments.  Indeed,  given  a Pearson  pdf,  there  is  a recurrence  relation  for  the  moments 
which  will  calculate  moments  of  higher  orders  easily,  but  the  first  four  determine  all  of  them. 

If  two  Pearson  pdfs  f and  g are  convolved  to  form  a pdf  h,  h is  not,  in  general  Pearson,  but 
a Pearson  representative  which  matches  it  in  the  first  four  moments  can  be  determined.  (This 
can  be  demonstrated  by  comparing  the  higher  order  moments  (p^,  p^,  etc.)  of  the  Pearson 
representative  for  h with  the  values  for  the  true  li  for  some  numerical  examples.) 

Generally,  there  is  no  advantage  to  using  the  Pearson  family  in  place  of  representation  by 
the  first  four  moments  (see  di  cussion  of  representation  by  moments,  below).  The  moments 
are  easier  to  compute  with  for  f-convolutions  and  easier  to  obtain  in  the  first  place. 

Another  type  of  representation  for  pdfs  is  in  terms  of  the  coefficients  in  a series  of  ortho- 
gonal polynomials.  One  of  the  best-known  of  these  schemes  is  the  Gram-Charlier  Type  A 
series  | Kendall  63).  This  is  an  expansion  in  terms  of  Tchebycheff-llorniite  polynomials. 

Ihe  major  properties  of  the  Gram-Charlier  series  for  use  in  a statistical  compiler  are: 

(1)  The  coefficients  arc  easily  calculated  from  the  moments,  and  the 
moments  may  be  easily  calculated  from  the  coefficients, 

(2)  It  is  easy  to  convert  from  a pdf  in  this  form  to  a cdf  ir  this  form, 
and  conversely, 

(3)  The  sum  of  a finite  number  of  terms  of  the  scries  may  produce  nega- 
tive values  for  the  probabilities  near  the  tails, 

(4)  A series  of  n terms  may  be  a poorer  fit  than  a series  of  n-1  terms, 
and 

(5)  [f  there  is  a significant  skew  in  the  pdf,  then  the  fit  will  probably  be 
poor. 

f-convolutions  would  be  performed  on  the  pdfs  by  manipulating  the  coefficients  in  the  Gram- 
rharlier  representation  of  the  pdfs.  Closed-form  relations  between  the  coefficients  can  be  ob- 
tained for  cases  when  the  relations  between  the  moments  are  known:  for  the  case  of  + -convolution, 
these  relations  are  particularly  simple.  If  our  representation  is  in  terms  of  Ihe  Gram-Charlier 
series  about  the  mean,  and  the  representation  for  X is  given  by  <x,  xQ,  x( ...  >,  where  x is  the 
mean  of  the  random  variable  X,  and  x^  is  the  coefficient  of  the  i ' ^ term  in  the  series  expansion, 
then  for  Z ; X t Y,  (X  and  Y independent)  we  have  the  following  relations: 

z = x t y 
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Z2  = x2  + y2  + 1/2 

z3  = x3  + y3 

z4  = x4  + y4  + x2y2  + 1/8 

Ihis  is  obtained  from  combining  the  expressions  for  the  Gram-Charlier  coefficients  ((Kendall  63, 
p.  157])  with  expressions  for  the  moments  of  Z ir  terms  of  the  moments  of  X and  Y. 

For  ''-convolution,  the  same  approach  can  be  applied  to  obtain: 

z = xy 
zQ  = i 

Zj  = 0 

z2  = 2x2y2  +x2  +y2  + 1/2 

z3  = 6x3y3 

z4  = 24x4y4  + 12x4y2  + 12x2y4  + 10x2y2  + 3x4  + 3y4  + x2  + y2  + 1 /4 


these  expressions  are  relatively  easy  to  deal  with,  although  *he  number  of  terms  begins  to  get 
unwieldy  for  "‘-convolution  with  higher  order  series. 

Given  the  basic  accuracy  .imitations  on  Gram-Charlier  series  outlined  abo\  e,  they  can  only 
be  recommended  where  the  ease  of  conversion  to  cdf  representations  is  important,  and  "nice" 
pdfs  are  used.  Usually,  one  of  the  other  forms  will  be  more  convenient  for  computation  and 
avoid  the  accuracy  limitations  inherent  in  Gram-Charlier. 

Representation  by  cdf 

In  order  to  obtain  error  bounds  in  the  discussion  of  convolution  above,  we  had  to  make 
assumptions  which  are  hard  to  justify  regarding  the  smoothness  and  differentiability  of  our 
pdfs.  Generally,  trying  to  represent  and  compute  with  functions  which  are  not  smooth  is  sub- 
ject to  many  pitfalls  and  should  be  avoided  where  possible.  For  this  reason,  an  attractive  al- 
ternate representation  is  the  distribution  function,  also  referred  to  as  the  cumulative  distribu- 
tion function  (cdf)  to  emphasize  the  distinction  from  the  pdf  more  strongly. 

Ihe  choices  for  representation  of  the  cdf  are  similar  to  the  choices  for  representing  pdfs, 
ranging  from  sampling  to  series  and  continued-fraction  expansions.  We  discuss  below  what 
some  operations  on  random  variables  imply  about  operations  on  the  corresponding  cdf  and 
leave  the  implementor  to  choose  the  most  convenient  representation  for  the  expected  cdfs. 

The  first  thing  to  note  is  that  the  calculation  of  the  + -convolution  for  two  random  variables 
cannot  be  expressed  directly  in  terms  of  the  cdfs  of  those  random  variables. 
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Fz(z)  = P (X  + Y < z] 
I00  pz-x 


p°°  pz-x 

j j fx,  y(x'  y)  dy dx 


Assuming  X and  Y are  independent,  we  have 

^Z-X 


p°°  pZ-X 

Fz(z)  = J fx(x)  J fy(y)  dy  dx 


•r 


fx(x)  Fy(z  - x)  dx 


Computing  this  requires  calculating  and  then  forming  its  convolution  with  Fy,  but  I know  of 
no  way  to  avoid  the  calculation  of  the  pdf  of  one  of  the  random  variables. 

However,  there  are  other  operations  among  random  variables  which  translate  quite  di- 
rectly into  operations  on  their  cdfs.  For  example,  consider  Z = max  (X,  Y). 

Fz(z)  = P (max (X,  Y)  « z] 

= P ( (X  < z ) and  ( Y < z )] 

Assuming  X and  Y are  Independent,  we  have 
Fz(z)  = Fx(z)  Fy(z)  . 

Similarly,  for  Z = min  (X,  Y),  we  obtain 

FZ(Z)  = FX(Z)  + FY(z)  ~ FX(Z)  Fy(z)  ’ 

Note  that  performing  max-convolution  with  pdfs  would  require  calculating  both  cdfs. 

Another  operation  which  is  particularly  convenient  with  cdfs  is  the  following  (expressed  in 
ECL): 

CHOICE  < — EXPR(X:  real\random, 

Y:  real\ranuom, 

Z:  real\random, 
k;  REAL; 
real\random) 

()  X LE  k =>  Y;  Z (]; 

W <—  CHOICEfX,  Y,  Z,  k); 

Then  we  have  assuming  independence  of  our  random  variables, 

FW<w>  = P|((X«  k)  and  (Y  < w))  OR  ((X  > k)  and  (Z«  w))] 

= Irx(k)  Fy(w)  + (1  - Fx(k))  lrz(w) 

Generally,  when  there  are  many  conditional  operations  to  be  performed,  especially  those 
involving  comparison  with  REALS,  the  cdf  is  the  more  convenient  form  to  use. 


Moments 

Statisticians  have  long  used  moments  to  describe  succinctly  the  characteristics  of  a distri- 
bution. Such  measures  as  variance,  skewness,  and  kurtosis  are  related  to  the  moments  about 
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the  origin  or  about  the  mean  in  direct  and  simple  ways.  Moments  have  the  additional  advantage 
from  our  viewpoint  of  being  particularly  convenient  for  certain  f-convolution  operations.  In 
many  cases,  the  answer  desired  by  the  poser  of  the  original  problem  is  expressed  directly  in 
terms  of  moment  measures,  i.e.,  what  is  the  mean  and  standard  deviation  of  X?  We  must  be 
wary,  however,  because  not  all  distributions  have  moments  of  all  orders  and  for  certain  cases, 
no  moments  at  all  exist.  In  the  following,  we  will  assume  that  the  input  distributions  we  use 
have  moments  of  all  the  degrees  that  we  use  to  represent  the  distributions. 

An  f-convolution  which  is  particularly  simple  to  perform  with  a moment  representation  is 
-convolution.  I’or,  if  Z = X * Y,  we  have  the  following 

E [Zn]  = E |(XY)n] 

3 1.  £00  (xy)nFx,y,>:'y)  ****  • 

Assuming  X and  Y are  independent,  we  have 

/!*>  /V30 

E |Zn]  = J oo  j X"Ax(x)  fY(y>  dx  d* 

poo 

= J ynfy(y)  J xnf^(x)  dx  dy 

= J ynfY(y)  E [Xn]  dy 
= E |Xn]  E f Yn]  . 

Thus,  if  X and  Y have  moments  to  degree  m,  then  so  will  Z.  In  words,  the  moments  of  a 
product  of  independent  random  variables  are  the  product  of  the  moments  of  those  random 
variables. 

Equally  simple  is  the  form  Z = Xk  for  k a positive  integer.  Here  we  have  directly  that 
E [Z  ] = E [X  ].  In  this  case,  Z will  have  moments  to  degree  m if  X has  moments  to  de- 
gree km. 

Since  E IgjiX)  + g2(X)]  = E |g1(X)J  + E [g2<X)J,  we  can  extend  these  results  to  obtain,  if 

z = 2 2 2 aijkxivJwk 
* j k 

then  we  have,  assuming  independence  of  X,  Y,  and  W, 

E lzJ  = 2 2 2 aijkE  [XlJ  E[YJj  E | Wk]  . 

» j k 

To  obtain  the  higher  moments,  expand  Zn  symbolically  to  obtain 

zn  = 2 2 2 bijkxVwk  • 
i j k 
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Then, 


E [Z"l  = S S 2 bijkE  Ixl|  E !Yj|  E [Wk)  , 

i j k 

Thus,  we  can  evaluate  the  moments  of  a multinomial  directly  from  the  moments  of  its  input 
distributions,  assuming  independence  of  those  input  distributions  and  existence  of  moments  of 
adequately  high  degree.  In  the  above  example,  if  the  multinomial  for  Z is  of  degree  k in  X, 
then  evaluation  of  E[Zn|  requires  the  existence  of  moments  of  degree  up  to  kn  for  X. 

Characteristic  Functions 

The  characteristic  function  is  the  continuous  Fourier  transform  of  the  pdf;  it  is  also  the 
moment  generating  function.  Whenever  the  pdf  exists,  so  does  the  characteristic  function, 
although  an  arbitrary  function  is  not  in  general  the  characteristic  function  for  some  random 
variable.  The  characteristic  function  has  some  convenient  properties  for  f-convolution,  namely 
if  \x  and  \y  are  the  characteristic  functions  for  the  random  variables  X and  Y,  then  \^+  , = 
*X^y  *s  **le  characteristic  function  for  their  sum,  assuming  independence. 

Note  that  the  characteristic  function  representation  was  used  to  perform  convolution  foe 
sampled  pdfs.  That  is,  to  perform  convolution,  we  first  took  the  DFT  of  the  input  pdfs  to 
obtain  an  approximation  to  the  characteristic  function  and  then  used  the  property  indicated 
above  to  perform  the  convolution.  If  a sequence  of  +-convolutions  are  to  be  performed,  then  it 
is  more  efficient  overall  to  leave  the  pdfs  in  the  characteristic  function  form  and  only  convert 
back  to  sampled  form  after  the  entire  sequence. 

Atomic  Distributions 

We  say  a distribution  F is  atomic  if  F is  concentrated  on  its  set  of  atoms,  i.e.,  if  the 
atoms  of  F contain  the  complete  mass  of  the  distribution.  It  is  easy  to  show  that  there  are  at 
most  denumerably  many  atoms  for  any  atomic  distribution. 

Just  as  in  the  absolutely  continuous  case,  there  exists  an  associated  function,  the  proba- 
bility mass  function  (pmf)  p,  such  that 

F(x)  = V p(a.) 
a^<x 

where  a.  ranges  over  the  set  of  atoms  for  F which  are  less  than  or  equal  to  x. 

Tae  pmf  then  maps  the  atoms  a.  of  F into  the  probability  mass  associated  with  those  atoms, 

i.e.,  the  amount  of  increase  of  1 at  a.. 

1 

The  representation  choices  for  atomic  distributions  are  similar  in  many  ways  to  the  choices 
for  absolutely  continuous  distributions.  For  representations  such  as  cdf  or  moments,  the  dis- 
cussions already  presented  apply  equally  well  here. 

However,  for  other  representations,  such  as  pmf  or  characteristic  functions,  some  modi- 
cations  are  necessary.  Also  there  are  some  important  subclasses  of  the  atomic  distributions. 
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where  some  specific  techniques  can  be  employed.  These  subclasses  are  the  finite  atomic  dis- 
tributions (those  with  a finite  number  of  atoms),  and  the  integer  distributions  (all  atoms  are 
integers). 

We  discuss  below  some  of  the  effects  of  these  differences. 

Representation  by  pmf 

The  pmf  may  be  compared  with  the  pdf  representations  described  earlier  for  absolutely 
continuous  distributions.  The  pmf  by  its  nature  is  in  a sampled  form,  but  there  is  no  guarantee 
that  the  spacing  is  uniform,  or  that  only  a finite  set  of  samples  can  provide  a complete 
representation. 

Moreover,  each  atom  represents  only  its  own  point  in  pmf,  whereas  in  a sampled  pdf  each 
sample  point  represents  an  interval.  This  effect  is  most  striking  in  Poking  at  unary  opera- 
tions: e.g.,  consider  the  following  theorem  (Parzen  60). 

If  y = g(x)  is  differentiable  for  all  x,  and  either  g'(x)  > 0 for  all  x or 
g'(x)  < 0 for  all  x,  and  if  X is  a continuous  random  variable,  then 
Y = g(X)  is  a continuous  random  variable  with  probability  density  func- 
tion given  by 

fy(y)  = fx<g  *(y))  I ^ g"1<y)  I if  ye  range  of  g 
= 0 otherwise. 

For  contrast,  the  relationship  between  pmfs  is  quite  different,  namely 
fy(y*  =fx(g~1(y')  if  ye  range  of  g 
= 0 otherwise 

since  we  are  only  interested  in  the  mapping  of  individual  atoms. 

If  there  are  too  many  atoms  in  the  distribution  to  store  them  individually,  then  some  group- 
ing of  atoms  must  be  performed.  The  effects  of  this  groupit.g  on  f-convolution  operations  will 
depend  on  the  nature  of  the  distribution  and  the  spacing  between  the  atoms.  If  the  atoms  are 
"dense  enough,"  then  the  distribution  maybe  approximated  as  a continuous  distribution,  how- 
ever, the  exact  specifications  for  "dense  enough"  will  depend  on  the  application. 

Although  the  number  of  atoms  in  the  input  distributions  may  be  small,  this  docs  not  preclude 
the  possibility  of  generating  large  intermediate  results.  For  example,  if  the  distributions  for 
X and  Y contain  n atoms  each,  then  the  distribution  for  X + Y will  contain  anywhere  from  2n 
to  n atoms  depending  on  the  spacing  of  the  atoms  in  X and  Y. 

Thus,  it  may  be  necessary  to  dynamically  alter  the  grouping  of  atoms  into  sample  points  in 
order  to  maintain  a feasible  number  of  points  in  intermediate  distributions,  although  again  the 
error  effects  are  difficult  to  predict. 

Integer  Distributions 

The  case  of  integer  distributions  is  of  particular  practical  importance.  Much  of  the  discus- 
sion under  probability  mass  functions  also  applies  to  integer  distributions,  but  there  is  an  addi- 
tional technique  which  is  of  particular  interest  in  this  case,  namely  representation  by  generating 
functions. 
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The  basic  concept  is  to  represent  ‘he  probability  values  as  the  coefficients  in  a polynomial. 
The  entire  polynomial  then  becomes  the  re  v esentation  for  the  distribution,  or  its  generating 
function. 

That  is,  let 

A(s)  = p(0)  + p(l)  s + p(2)  s2  +. . . 

then  for  pmfs,  A(s)  converges  absolutely  for  |s|  1 

Some  of  the  properties  of  generating  functions  make  them  particularly  convenient  for 
f-convolution  operations,  tor  example,  if  A(s)  and  B(s)  are  the  generating  function  represen- 
tation for  the  random  variables  X and  Y,  then  we  have  A(s)  * B(s)  as  the  generating  function 
representation  for  X + Y. 

However,  the  most  interesting  application  of  generating  functions  in  the  statistical  compiler 
area  is  for  evaluating  the  following  function  of  random  variables,  expressed  as  an  ECL  program. 

SUM  — EXPR  (X:  real\random, 

N:  realsrandom; 
realsrandom) 

[)  DECL  Z:  realsrandom  BYVAL  point  (0); 

/*  ' Z is  initialized  to  a distribution  whose  only  atom  is  0 
FOR  i FROM  1 TO  N 
REPEAT 

DECL  S:  real\random  11D  X; 

/*  ' S is  a series  of  independent  distributions  all  with 
identic  il  distribution  functions  which  a~e  the  same  as  the 
distribution  function  for  X 
Z - Z + S; 

END; 

z; 

0; 

That  is,  SUM  calculates  the  distribution  of  the  sum  of  N values  independently  chosen  from 
the  distribution  X,  where  N is  a random  variable.  The  generating  function  representation  of 
this  is  surprisingly  simple  [Feller  57,  Vol.  1,  p.  268):  if  A(s)  is  the  generat  ig  function  for  N, 
and  B(s)  is  the  generating  function  for  X,  then  A(B(s))  is  the  generating  function  for  SUM(X,  N). 

Remember,  however,  the  difficulties  imposed  by  the  addition  of  looping  capabilities  to  our 
set  of  base  functions.  We  are  able  to  avoid  the  problem  only  because  the  restrictions  for  prac- 
tical computations  are  so  severe.  If  either  X or  N is  represented  as  an  infinite  sequence  of 
coefficients  of  a generating  function,  then  the  computation  will  never  terminate,  it  concludes 
only  if  the  generating  functions  for  both  X and  N can  be  expressed  in  closed  form,  i.e.,  if  X 
is  a Poisson  distribution  with  A,  then  its  generating  function  is  e_X+Xs.  Thus,  although  this  is 
a convenient  way  to  represent  fhe  work  to  be  performed,  it  still  requires  a potentially  infinite 
amount  of  computation. 

Conversion  Between  Representations 

Since  it  is  probable  that  the  implementor  of  a statistical  compiler  will  wish  to  provide  a 
broad  variety  of  base  functions,  and  since  the  work  required  to  perform  a particular  f-convolution 
varies  dramatically  with  representation,  it  is  sometimes  desirable  to  change  the  representation 
during  the  computational  process.  In  some  cases,  these  conversions  are  simple  to  perform 
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rmmerieally  and  can  be  done  with  high  accuracy,  However,  some  conversions  have  theoretical 
bounds  on  their  accuracy  and  these  bounds  may  be  large  in  practical  cases.  The  purpose  of  this 

section  ,s  to  mdicate  the  nature  of  the  conversions  between  the  types  of  representations  dis- 
cussed above. 


From  pdf  to  cdf 

Since  the  cdf  is  the  integral  of  the  pdf.  and  integration  is  a smoothing  operation,  the  results 
can  be  expected  to  be  satisfactory.  Depending  upon  the  representations  used  for  the  pdf  and  cdf 
various  numerical  integration  techniques  apply.  See  (Davis  75]  for  a survey  of  techniques  which 
could  be  used.  Lor  certain  representations,  e.g..  the  Gram -Charlier  series,  the  conversion 
can  be  s.mply  performed  in  terms  of  the  parameters  describing  the  series. 


1 rom  pelf  to  Moments 

If  the  representation  of  the  pdf  is  a,  a P„,„0„  di„rlbut,„n  or  „ Cam-Charlier  series 
theoretical  expressions  for  the  moments  ran  be  obtained  directly  from  the  parameter,  of  ,,,'e 
representation  which  are  convenient  for  calculation. 

When  a sampling  representation  of  the  pdf  is  used,  the  obvious  calculation  for  the  moments 
is  to  evaluate 


(b-al^At 


ni 


V 

k- 1 


(a  t-  kAt)m  f(a  + kAt  - At/2) 


A hough  this  inner  product  can  be  directly  evaluated  with  appr,  priate  numerical  care 
(|U  Ukinson  63]>,  there  remain  a number  of  practical  problems.  For  example,  if  the  unsampled 
secuons  in  the  tails  of  the  distribution  have  enough  weight,  then  any  moment  calculated  from  the 
sampled  version  will  he  unreliable,  .ndeed,  the  moments  calculated  from  the  sampled  distri- 
bution always  exist,  whereas  the  moments  of  the  infinite  range  distribution  from  which  it  is 
deriveq  uo  not  necessarily  exist.  Further,  even  in  cases  where  this  is  not  an  issue,  we  are 
tolrvll  " aSSUIr‘Ptlon  that  weight  in  an  interval  At  is  concentrated  at  the  center  of  the 

ofJ*r?roblemS  are  the  Same  aS  thOSe  aHSing  When  °ne  attempts  to  estimate  the  moments 
f a distribution  from  a sample  selected  from  the  population  described  by  that  distribution,  and 

hat  sample  has  been  grouped.  ,f  the  distribution  is  quite  well  behaved  ,to  be  specified  shortly, 

en  this  calculation  can  be  accurately  performed,  with  Sheppard's  corrections  used  to  correct’ 

or  the  grouping  effect  ( Kendall  6 3].  The  conditions  for  application  of  Sheppard's  corrections 
are,  however,  fairly  stringent. 


,rd:, : * •»*-  - 


. ,k  . ..  m ^nr  ..aw,  a nuuiuLT  oi  concmior 

atth  H V = X f<X)'  thenthefirst2m- ^ derivatives  of  v should  approximately  vaninh 
dt  the  ends  of  t.llP  n „ r ..  , . ... 


at  the  ends  of  the  sampled  interval  (i.e.,  f must  have  high-o  r contact  with  the  axis  in  its 
ails),  (2)  y must  have  2m  derivatives,  and  (3)  4y,2m)  (e)  (b  a)  At2171’1  lor  some  - 

the  interval  (a.  b]  provides  a measure  of  the  error  in  the  calculation,  and  should  be  small 


1 rom  pdf  to  Characteristic  Function 

The  calculation  of  the  characteristic  function  from  the  pdf  is  merely  the  calculation  of  the 
continuous  Louner  transform  of  the  pdf.  Fast  Fourier  transform  techniques  can  be  used  to 
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convert  rapidly  between  sampled  representations  of  the  pdf  and  the  characteristic  function,  and 
conversely.  The  errors  caused  by  converting  a continuous  curve  to  a sampled  version,  and 
then  calculating  a discrete  l'ourier  transform  and  using  it  as  an  approximation  to  the  continuous 
Fourier  transform  are  well  known  and  discussed  in  several  articles  and  texts  (e  g.,  |Cooley  70]). 

Generally,  this  conversion  will  work  satisfactorily  if  (1)  the  pdf  does  not  have  very  "high- 
frequency"  components,  and  (2)  the  characteristic  function  is  desired  for  relatively  small 
values  of  its  argument.  If  we  sample  the  pdf  at  interval  At,  then  the  characteristic  function 
obtained  by  FFT  techniques  can  be  in  error  by  100  percent  when  its  argument  is  l/(2At)  (Nyquist 
frequency)  because  of  sampling.  This  "aliasing"  error  will  be  acceptably  small  only  if  the  value 
of  the  characteristic  function  is  negligible  at  this  point.  In  addition,  chopping  off  the  tails  of  the 
pdf,  which  is  a multiplication  in  the  original  domain,  represents  a convolution  operation  in  the 
transformed  domain.  The  function  convolved  with  is  of  the  form  (sinx)/x,  producing  significant 
errors  in  the  characteristic  function  referred  to  as  "leakage"  errors.  These  "leakage"  errors 
can  be  compensated  for  by  using  a "window"  function  on  the  original  pdf  which  is  not  a rectangular 
data  window,  but  rather  some  'unction  with  more  tapered  ends.  At  this  stage,  the  selection  of 
an  appropriate  window  is  very  much  an  art  and  is  discussed  in  |Bergland  69). 


From  Moments  to  cdf 

The  problem  of  conversion  from  moments  to  the  cdf  and  pdf  forms  of  representation  is  a 
classical  mathematical  problem,  and  lias  been  attacked  by  such  mathemat  cians  as  Tchcbvcheff, 
Markoff,  Stieltjes,  Hausdorff,  and  Hamburger.  The  "Sticltjcs  integral"  was  introduced  in  1894 
in  a paper  which  also  provides  a solution  to  one  variant  of  the  problem  of  moments. 

This  wealth  of  mathematical  interest  derives  from  the  fact  that  the  problem  is  hard  enough 
to  be  interesting,  yet  fruitful  enough  to  produce  many  fascinating  results.  From  our  viewpoint, 
however,  these  results  are  essentially  negative,  and  wc  examine  their  relation  to  our  problem 
below. 

The  first  variant  of  the  moment  problem,  referred  to  as  the  Hamburger  moment  problem, 
assumes  we  arc  given  an  infinite  sequence  of  numbers  and  asked  whether  these  are  the  moments 
for  some  distribution  function  on  the  infinite  interval  ( ■*>,  ■*  ),  and  whether  we  can  find  that 
distribution  function.  To  state  that  more  carefully,  wo  offer  the  following  theorem: 

Theorem:  (see  |Shohat  43,  Theorem  1.2]) 

In  order  that  a Hamburger  moment  problem 

xn  = f tndl  , n 0,  1,2,... 

^ - on 


shall  have  a solution,  a distribution  function  F,  it  is  necessary  that 


x0  X1 


n I 


0 


n 0,  1,  2,  . . . 


xn  xntl  x2n 
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In  order  that  there  exist  a solution  which  is  an  atomie  distribution  whose  set  of  atoms  eon 
sists  of  exactly  (k  + 1)  distinct  points,  it  is  necessary  and  sufficient  that 


Aq  >0,  Aj  > 0, 


Ak>°, 


^+1  = Ak+2  ‘ 


= 0 


The  moment  problem  is  determined  in  this  case. 

The  condition  stated  here  is  expressed  in  terms  of  an  infinite  sequence  of  moments,  and  further, 
even  if  true,  does  not  guarantee  that  the  function  F is  unique.  Indeed,  counter-examples  can  be 
found,  e.g.,  (Kendall  63],  since 


xn  exp(-axA)  sin(bxA)  dx  = 0 a > 0,  0<A<l/2 

the  pdfs 

f(x)  = k exp  (-ax  ) (1  +•  t sin  (bxA )],  0 < x < a>0,  o < A < 1 /2,  | e | < 1 

have  the  same  infinite  set  of  moments  for  all  e in  the  allowed  range. 

An  additional  theorem  gives  the  condition  for  a substantially  unique  distribution  with  a given 
set  of  moments. 


Theorem:  [Shohat  43,  Theorem  1.10] 

A sufficient  condition  that  the  Hamburger  moment  problem  be  determined  (i.e.,  have  a 
"substantially  unique"  solution)  is  that 


n=l 

Again  this  is  a condition  on  an  infinite  sequence  of  moments  although  we  have  only  a finite  se- 
quence and  the  numerical  evaluation  of  such  a test  is  prone  to  significant  errors. 

If  wc  restrict  ourselves  to  a finite  interval,  say  (0,  1],  and  consider  the  reduced  moment 
problem  (t.e.,  only  a finite  number  of  moments  are  given),  the  situation  is  only  partially  im- 
proved. If  we  are  given  a sequence  of  moments  xQ,  x, xn>  then  we  can  determine  effec- 

tively two  values  for  xnf  { which  represent  the  upper  and  lower  bounds  for  the  next  moment. 

We  can  further  determine  atomic  distributions  with  about  n/2  atoms  which  achieve  both  the 
upper  and  lower  bounds  on  xn  + 1 (call  these  Fu  and  I ,).  If  D|x  represents  the  set  of  cdfs  having 
the  moments  x^.x^,.  . . , xn>  then  we  have  several  theorems  |Karlin  53]. 

Oefinition:  Let  Ojx  represent  the  subset  of  D|x  consisting  of  atomic  distributions  only. 
Theorem:  The  sets  D[x  and  O^Jx  arc  convex. 

Theorem:  The  extreme  points  of  Djx  are  those  functions  F with  the  number  of  distinct 
atoms  ^ n + 1. 

Theorem:  Djx  is  spanned  by  its  extreme  points,  and  in  the  weak  * topology  (i.e.,  using 

a particular  measure  of  distance  between  cdfs),  D|x  is  also  spanned  by  the  extreme  points 

of  D lx. 
a 1 

Theorem:  If  F e d|x,  then  the  differences  F - I’u,  and  F - Fy  if  not  identically  zero,  each 
have  exactly  n - 1 sign  changes  in  the  interval  [0,  1], 
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An  example  should  help  clarify  the  situation.  Consider  the  Beta  distribution  with  param- 
eters 12  and  6. 

dF  = B(iZ,b)  xll(1  ~x)5  0 < x < 1 . 

The  moments  for  this  distribution  may  be  easily  calculated  by  integration,  using  the  properties 
of  the  complete  Beta  integral  to  obtain  the  following  values: 

p|  = 0.6666,667 
\i'z  = 0.4561,404 
p}  = 0.3192,982 
P4  = 0.2280,702 

Following  [Karlin  53]  we  can  then  determine  Fu  and  Fj  to  have  the  same  first  four  moments. 

The  work  required  to  calculate  these,  given  n moments,  is  approximately  the  cost  of  evaluating 
n determinants  of  size  n/2,  finding  all  the  roots  of  two  polynomials  of  degree  n/2  (roots  are 
guaranteed  to  be  real,  and  in  [0,  1]),  and  finally  solving  two  sets  of  linear  equations,  each  of 
size  n/2.  Performing  the  calculations  we  obtain  the  following:  F has  atoms  at  0.0,  0.5612  724 
and  0.7720,849  with  jumps  of  0.0013,5098,5,  0.4951,089,  and  0. 5035,401,  respectively.  F has  ’ 
atoms  at  0.5104,210,  0.7276,077,  and  1.0,  with  jumps  of  0.3010,400,  0.6826,570,  and  0.0163,031, 
respectively.  The  reader  may  readily  verify  that  these  distributions  have  the  same  first  four 
moments  as  those  given  above.  The  situation  is  graphed  in  Fig.  5-2  (values  for  the  Beta  distri- 
bution were  obtained  from  [Pearson  68]).  As  can  be  seen,  the  functions  are  not  close  approxi- 
mations to  each  other,  and  the  pdfs  would  be  even  more  disparate.  'rhe  intertwining  behavior 
seen  in  the  graphs  is  typical,  and  the  theorem  above  states  that  any  other  cdf  with  the  same 
first  four  moments  must  also  cross  every  horizontal  and  vertical  line  in  the  graphs  of  F and  F . 

Thus,  while  we  can  bound  the  behavior  of  cdfs  from  descriptions  of  their  moments,  “hese  1 
bounds  are  necessarily  gross  and  often  in  an  inconvenient  form  for  further  computing.  Several 
authors  (e.g.,  [Burr  42])  have  suggested  techniques  for  directly  fitting  cdfs  to  a given  set  of 
moments,  but  these  techniques  have  depended  on  first  choosing  a family  of  cdfs  and  then  deter- 
mining the  parameters  of  the  specific  curve  in  that  family.  Because  of  the  large  errors  whica 
may  occur,  these  techniques  cannot  be  recommended  for  use  in  a statistical  compiler. 


VI.  DESCRIPTION  OF  AN  IMPLEMENTATION 


We  now  alter  the  approach  of  the  last  few  sections  and  describe  a practical,  although 
limited,  implementation  of  a statistical  compiler.  We  will  actually  describe  the  system,  called 
STAT,  twice;  once  from  the  viewpoint  of  a user,  and  then  again  from  the  viewpoint  of  the  im- 
plementor of  such  a system.  STAT  is  operational  on  the  Harvard  PDP-10,  in  conjunction  with 
the  ECL  programming  system  [Harvard  741.  The  user  of  STAT  is  expected  to  be  conversant 
with  ECL  and  to  be  operating  interactively. 

STAT  provides  its  user  with  an  additional  mode  ("a  new  data  type"),  called  rciilVamtotn,  and 
the  facilities  to  create  and  use  objects  of  this  new  mode.  These  facilities  arc  integrated  with 
the  ECL  system,  so  that  the  capabilities  to  loop,  call  functions,  perform  l/O,  and  other  such 
capabilities  of  a general  system  remain  available.  However,  the  operations  which  may  he  per- 
formed on  objects  of  mode  rcalXrundom  are  only  a restricted  subset  of  the  ECL.  operators. 

User  Description  of  STAT 

In  order  to  use  the  STAT  system,  the  user  begins  by  invoking  the  ECL  system  in  the  usual 
fashion  and  then  "LOAD"s  the  STAT  files.  When  this  operation  is  completed,  the  user  has  avail- 
able all  the  usual  ECL  capabilities  and  may  use  them  with  no  modification.  In  addition,  the  op- 
erator for  exponentiation  has  been  provided  which  may  be  applied  to  INTs  and  RKALs  with 
the  exnected  results. 

The  STAT  environment  also  provides  a new  mode  c-iled  rf<il\r:iiiili>m  and  several  functions 
("constructors")  to  create  objects  with  this  mode.  In  addition,  the  op.-rntoi  s 
"/",  and  " have  been  extended  to  deal  with  objects  of  mode  mulVainloiii,  although  not  all  possi- 
ble combinations  are  allowed. 

The  new  constructors  generate  random  varia.  les  with  some  standard  distributions,  as  well 
as  allowing  the  user  to  enter  an  arbitrary  distribution.  Although  the  current  version  of  ST  \T 
provides  relatively  few  constructors,  it  would  be  easy  to  add  m<  tc  as  no  other  parts  of  the  sys- 
tem are  dependent  on  the  set  of  constructors  provided.  The  current  net  of  « onstj  . clors  is: 

point  tXI’Rj  u: Mil  HI;  milXrumlom) 

This  function  produces  a random  variable  which  will  assume  the  value  a 
with  probability  1. 

ilistnbution  E.XI’Rf  (ub.SLIJfVLl  ll)R(J  IE  U )),  r.-uLruiiilem) 

Returns  a random  variable  Wnoso  prnbabii  ■ mass  function  is  gi.  ..  in 
lab.  tub  is  a sequence  of  pairs,  the  fitst  ant  of  which  is  a probability 
and  the  second  is  a discrete  value  assume  : by  tin-  randrm  variable. 

uniform  E\PR(  a.  ARI  III,  l>:  ARIfll;  malXramloui) 

Returns  a random  variable  whose  probability  density  function  is  a uni- 
form distribution  with  lower  limit  a and  upper  limit  b. 

guns  si  an  E.M’R  ( im-an ; Md  I II,  sigma:.ARI  III;  rralVainlori) 

Returns  a random  variable  whose  probability  density  function  is  a nor- 
mal (Gaussian)  centered  at  mean,  with  standard  deviation  sigma. 
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pois  son 


EXPR(  lambda:ARITH;  real\random) 

Returns  a random  variable  whose  probability  density  function  is  a 
Poisson  distribution  with  mean  and  standard  deviation  lambda . 

The  ccneept  of  independence  of  objects  of  mode  real\random  is  eritieal  to  the  implementation. 
The  constructors  described  above  produce  independent  random  variables  each  time  they  are  in- 
voked. Variables  of  mode  rcalVandam  which  are  calculated  through  the  use  of  operations  are  de- 
pendent on  their  computational  ancestors,  and  through  them  dependent  on  any  of  their  descen- 
dants. An  example  will  help  clarify  the  meaning  and  necessity  for  maintaining  these  dependency 
relations.  Consider  the  following  two  pieces  of  code: 

1.  a <- uniform(2,  4);  a + a; 

2.  a <- unifcrm(2,  4);  b <-  uniform(2.  4);  a+b; 

The  STAT  system  views  these  as  representing  two  quite  different  situations.  The  first  rep- 
resents choosing  a value  for  the  random  variable  a from  a uniform  distribution,  and  then  add- 
ing that  value  to  itself.  T.ie  resulting  distribution  is  identical  to  2*unifonn(2,4).  The  seeond  ex- 
ample represents  the  summing  of  two  independent  ehoiees  from  the  identical  uniform  distribution. 
a and  b are  independent  and  identically  distributed  random  variables,  and  the  result  of  their 
sum  is  a random  variable  whose  distribution  is  '.ne  convolution  of  uni[onn(2,4)  with  itself.  The 
means  for  the  two  expressions  are  the  same,  but  the  variance  of  the  first  is  larger  (in  faet, 
twice  as  large  as  that  of  the  seeond).  This  may  not  at  first  be  obvious,  but  can  be  explained  as 
follows:  In  the  first  case,  a value  is  chosen  from  the  distribution  and  doubled.  If  it  is  near  the 
high  end,  there  is  no  opportunity  for  it  to  be  countered  bv  a value  from  the  small  end.  In  the 
seeond  case,  if  the  value  selected  for  a is  high,  there  is  still  a chance  that  the  value  selected 
for  b will  be  small,  thus  reducing  the  variance  of  the  result. 

Example  Session  1 

-i  a <-  nniform(2,  4); 
a + a$ 

mean  6.0  var  = 1.3333335  skevness  = 0.0  kurtosis  1.8000408 

-'  a <-  uniform(2,  4); 

->b  <-  uniform(2,  4); 

->a+b$ 

mean  6.0  var  = 6.666<  678E-1  skewness  ~ -1.4016083E-5  kurtosis  2.4000297 

STAT  extends  the  usual  ECI,  operators  "+",  "*  and  "/"  to  apply  to  real\  random  obieets: 

" + and  may  have  their  left  and  right  arguments  be  rcal\ random,  INT  or  REAL  in  any  com- 

bination. "/"  permits  the  left  argument  (numerator)  to  be  nMtNrandom,  INT,  or  REAL  but  alloys 
only  INT  or  REALfor  the  right  argument  (denominator).  Exponentiation  is  also  provided  ("'"); 
its  left  argument  (hase)  may  he  either  INT,  REAL,  or  real\ramlom , while  its  right  argument  (ex- 
ponent) may  he  only  INT  or  REAL.  If  the  base  is  a rculNriiiidotn,  then  the  exponent  is  restricted  to 
a positive  integer.  All  those  operator o will  produce  a result  of  mode  rcal\random  if  either  input 
argument  (or  ooth)  is  of  mode  rfalNrandoin. 


The  only  output  mechanism  provided  is  the  extension  of  the  ECL  PRINT  routine  to  handle 
rtal\raHdumobjects.  Its  output  is  the  mean,  variance,  skewness,  and  kurtosis  fl  ] of  the  distri- 
bution, This  reflects  the  fact  that  the  internal  representation  of  the  random  variables  is  in 
terms  of  their  moments,  and  these  are  the  most  convenient  values  available. 

These  capabilities  can  be  combined  to  produce  the  solution  to  some  of  the  problems  we  pre- 
sented in  Sec.  1 to  motivate  this  work.  For  example,  consider  the  problem  of  locating  a high- 
speed turnoff  for  a runway.  We  may  write  a function  of  REAL  variables,  which  we  call  DisKo60MPH, 
which  will  calculate  the  runway  location  when  the  plane  reaches  60  MP1L  This  function  will  ac- 
cept three  arguments:  the  touchdown  velocit'.  of  the  aircraft  ft,  the  touchdown  location  (I  (i.e., 
distance  along  the  runway),  and  the  decel  n rate  ci  (assumed  constant).  The  code  for  this 

function,  assuming  ft  is  in  knots,  tl  is  ir.  and  d is  in  ft/sec/sec,  is  the  following: 

Distto60  MPH  <- 

EXPH(vt  HEAL,  tl:KEAL,  d:REAL;  HEAL) 

BEGIN 

DECL  kLREAL  BYVAL  6.076E3  / 3.6E3; 

DECL  vf:REAL  BYVAL  6.0E1  * 5.28E3  / 3.6E3; 

( (vt  * kl)  A 2 - vf  * 2)  / (2.0  * d)  + tl; 

END; 

The  constants  kl  and  if  represent  the  conversion  factor  for  knots  to  ft/sec,  and  the  value 
of  60  MP1I  in  ft/sec,  respectively.  The  result  of  the  function  is  in  feet.  This  routine  can  be 
directly  applied  to  calculate  values  for  specific  inputs.  For  example. 

— >Distto60MPll(l  1 3.0,  1500.0,  5.0)$ 

4.3629695E3 

This  then  tells  us  how  far  down  the  runway  an  aircraft  would  be  when  it  reached  6o  MP11, 
assuming  it  had  touched  down  at  1500  ft  from  threshold,  at  a velocity  of  113  knots,  and  decel- 
erated at  a constant  rate  of  5 ft/sec/sec. 

The  values  used  for  the  variables  are,  of  course,  not  known  precisely  in  advance  for  every 
aircraft  using  the  runway.  However,  the  distribution  of  approach  speeds  for  aircraft  of  specific 
types  is  a measurable  quantity,  and  some  estimate  of  the  distribution  can  be  obtained  from  a 
priori  knowledge  of  the  airplane  characteristics.  The  touchdown  location  distribution  can  be 
directly  measured  by  the  thickness  of  tire  rubber  left  by  aircraft  during  their  initial  touchdown. 

We  will  assume  that  these  have  been  measured  or  estimated  and  values  chosen  from  a uniform 
distribution  can  be  used  to  approximate  them  both.  In  the  case  of  touchdown  velocity,  values  in 
the  range  [98, 128 J knots  are  a reasonable  approximation  to  the  characteristics  of  medium-scale 
commercial  aircraft  (i.e.,  Boeing  727  class)  [Dolat  73).  and  a plausible  range  of  touchdown  lo- 
cation values  is  [1000,  2000]  feet. 

With  these  values,  a variant  of  the  program  is  needed  to  deal  with  random  variables  as  in- 
puts. This  may  be  generated  by  modifying  the  above  program  to  change  the  mode  declarations 


[1  I If  p;  is  the  i moment  about  the  mean,  then  we  define  skewness  and  kurtosis 
skewness  = p|4i23 
kurtosis  p4/p| 

See  [Kendall  63]  for  more  details  regarding  the  meaning  of  these  measures. 


as  follows: 


for  the  input  variables  vt  and  tl.  and  to  change  the  mode  of  the 
modified  code  is  then  the  following: 


output  to  include  reol\random. 


The 


Distto60MPH  <- 

EXPR(vt:ONEOF(real\random,  REAL), 
tl:ONEOF(reaI\random,  REAL), 
d:REAL; 

ON EOF( rea lNrandom , REAL)) 

BEGIN 

DECL  kl:REAL  BVVAL  6.076E3  / 3.6E3; 

DECL  vf.REAI  BYVAL  6.0E1  * 5.28E3  / 3.6E3; 
( (vt  * kl)  * 2 - vf  * 2)  / (2.0  * d)  + tl; 

END; 


This  code 
the  runway. 


can  then  be  directly  executed  to  obtain  the  distribution  of  the  60-MPH 


point  along 


->Distto60MPH(  uniform(98.0,  128.0),  uniformflOOO.O,  2000.0)  5.0)$ 

mean  = 4.3843339E3  var  = 3.9454025E5  skewness  = 6.4335654E-2  kurtosis  = 2.2038808 

hi  hIn  faCH'/hC  V6l0City  and  IOCati°n  3re  n0t  lik6ly  t0  Chan«6  desPite  introduction  of 

from  TZslT  < ;4hlfle/he/deCeICrati0n  rate  varied  easily  by  the  pilot  over  the  range 

from  3 ft/sec/sec  to  14  ft/sec/sec.  This  effect  may  be  observed  today  as  pilots  "shoot"  for 

specific  turnoffs  and  decelerate  just  hard  enough  to  make  that  turnoff. 

We  can  then  write  a driver  program  which  will  step  the  deceleration  rate  over  the  range 

from  to  14  ft/sec/sec  and  print  a table  of  results,  which  will  operate  correctly  for  either 
random  variables  or  real  variables  for  rt  and  ti . 

Driver  <- 


EXPR(vt:ONEOF(real\random,  REAL),  tI:ONEOF(real\random  REAL)  ) 
BEGIN 

RRINTC 

d 6o  MPU  Location 
’); 

FOR  i FROM  3 TO  14 
REPEAT 
PRINT(i); 

PRINT! 1 '); 

PRINT(Distto60MPH(vt,  tl,  i)); 

PRINT! ' 

END; 

END; 


This  may  then  be  executed  to  obtain  the  following  tables. 
->Driver(  113.0,  1500.0  ); 

60  MPB  Location 
3 6.27161 58E3 
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4 5.07371 19E3 

5 4.3629695E3 

6 3.8858079E3 

7 3.5449782E3 

8 3.2893  559E3 

9 3.0905386E3 

10  2.9314847E3 

11  2.801 3498E3 

12  2.692904E3 

13  2.601 1421 E3 

14  2.5224891E3 

->l)river  ( uniform(98,  128),  uniform(1000,  2000)  ); 

d 60  MPH  Location 

3 mean  = 6.3072231E3  v;.r  = 9.477965E5  skewness  = 7.9995624E-2  kurtosis  = 1.99806 

4 mean  = 5.1054174E3  var  5.69594E5  skewness  - 7.2433547E-2  kurtosis  = 2.1047709 

5 mean  = 4.3843339E3  var  = 3.9454025E5  skewness  = 6.4335654E-2  kurtosis  2.2038808 

6 mean  = 3.9036116E3  var  = 2.9944912E5  skewness  = 5.6316626E-2  kurtosis  2.2855201 

7 mean  = 3.5602385E3  var  = 2.421125E5  skewness  = 4.8762627E-2  kurtosis  = 2.3447075 

8 mean  = 3.3027087E3  var  = 2.0489863E5  skewness  = 4.194312E-2  kurtosis  =-  2.3819054 

9 mean  = 3.1024077E3  var  = 1.7938487E5  skewness  = 3.597244E  2 kurtosis  - 2.3989007 

10  mean  = 2.9421669E3  var  = 1.6113513E5  skewness  = 3.0791744E-2  kurtosis  = 2.4008041 

11  mean  = 2.8110608E3  var  - 1.4763231E5  skewness  = 2.6383154E-2  kurtosis  = 2.3915575 

12  mean  = 2.7018058E3  var  = 1.3736231E5  skewness  = 2.2648378E-2  kurtosis  = 2.3736388 

13  mean  = 2.6093592E3  var  = 1.2936988E5  skewness  1 .9486725E-£  kurtosis  = 2.3510717 

14  mean  = 2.5301193E3  var  = 1.2302806E5  skewness  = 1.6836281E-2  kurtosis  = 2.3252265 

Implementation  of  STAT 

As  indicated  in  Sec.  Ill,  the  major  portions  of  a statistical  compiler  system  which  differ 
from  a conventional  compiler  are  the  representation  choices  and  the  simplification  rules.  In 
addition,  the  basic  pieces  of  lexical  and  syntactic  analysis  must  be  provided.  Thus,  there  were 
three  major  design  choices  to  be  made  in  the  development  of  the  STAT  system.  (1)  the  repre- 
sentation, (2)  the  approach  to  simplification,  and  (3)  the  mechanisms  for  lexical  and  syntactic 
ana  lysis. 

For  the  first  two  questions,  we  have  extensively  discussed  the  options  in  the  previous  sec- 
tions. In  the  STAT  system,  only  one  representation  form  is  employed:  moments.  This  then 
implies  that  the  operations  conveniently  available  for  pairs  of  real\randoin  variables  are  multipli- 
cation and  addition.  Further,  it  is  easy  to  provide  addition  and  multiplication  by  a REAL,  and 
exponentiation  of  a real \random  to  a positive  integral  power.  These  are  thus  the  operations  that 
have  been  provided  to  the  STAT  user. 

For  simplification,  1 have  chosen  to  employ  primarily  the  separating  set  rule.  Phrased  in 
a more  intuitive  and  specific  fashion  than  in  Sec.  IV,  the  rule  states  that  if  a set  of  variables  can 
be  found  which  separates  the  root  of  the  computation  tree  from  the  terminal  nodes,  then  the  only 
information  required  to  compute  the  distribution  of  the  variable  associated  with  the  root  node  is 
the  joint  distribution  of  the  variables  in  the  separating  set.  In  order  to  avoid  keeping  large 


multidimensional  probability  distributions,  we  limit  our  selection  of  separating  sets  to  sets  with 
independent  variables.  Thus,  the  only  information  we  must  maintain  as  we  compute  through  a 
tree  is  the  moments  of  the  individual  variables  in  the  successive  separating  sets. 

A key  part  of  the  compiler  thus  determines  a separating  set  of  independent  variables  for  a 
given  root  node.  This  is  employed  recursively  starting  with  the  desired  output  node.  That  is, 
first  a separating  set  of  independent  variables  is  determined  for  the  output  node.  To  obtain  the 
moments  for  each  non-terminal  node  in  this  set,  the  same  routine  is  invoked  recursively  to  de- 
termine a separating  set  for  that  node.  This  continues  until  the  separating  set  determined  con- 
sists entirely  of  terminal  nodes  whose  moments  can  be  directly  evaluated. 

One  way  to  view  this  process  is  to  say  that  STAT  compiles  backward  through  the  program 
and  then  executes  forward.  In  order  to  determine  which  variables  form  the  necessary  separat- 
ing sets,  STAT  must  start  from  the  desired  output  variable.  It  can  then  move  back  to  the  vari- 
ables in  the  separating  set  chosen  for  the  output  variables,  and  do  the  same  operations  as  if 
each  variable  in  this  set  were  the  output  variable.  When  this  process  has  pushed  the  attention 
of  the  STAT  compiler  up  the  tree  to  the  terminal  nodes,  it  may  now  begin  evaluation  of  moments 
for  the  variables  in  the  separating  sets,  moving  down  in  the  tree  until  the  moments  for  the  out- 
put variahle  have  been  evaluated. 

The  ECL  system  environment  is  used  to  aid  in  the  construction  of  the  computation  tree  for 
the  simplification  and  evaluation  sections  of  the  STAT  compiler.  Such  tasks  as  lexical  and  syn- 
tactic analysis  of  the  source  program  arc  performed  by  the  ECL  interpreter.  In  addition,  all 
operations  on  modes  such  as  REAL,  1ST,  BOOL,  etc.,  available  in  ECL  are  directly  accessible 
to  the  STAT  user.  These  include  conditionals,  looping,  subroutine  invocation,  and  l/O  functions. 
So  long  as  rcalvumlom  variables  are  not  employed,  the  STAT  routines  are  not  invoked.  Also,  if 
a STAT  user  tries  to  perform  an  operation  on  a realXramiam  which  is  not  supported  in  STAT,  then 
he  will  receive  a MODE  ERROR  from  ECL. 

When  the  ECL  program  does  reach  supported  operations  on  rcal\random  variables,  the  STAT 
mechanisms  arc  invoked.  Unless  a print  operation  is  requested,  the  STAT  routines  merely  use 
these  calls  to  construct  the  computation  tree.  Each  n’alVamism  variable  is  represented  by  a data 
structure  which  is  a node  in  the  computation  tree.  Calls  to  a STAT  operation  are  used  to  link 
the  nodes  into  the  tree  structure  with  pointers  to  the  left  and  right  fathers  of  the  results,  and  to 
record  the  function  associated  with  the  node  in  the  data  structure.  Thus,  the  STAT  routines 
partially  decompile  the  ECL  program  to  obtain  the  structure  (computation  tree)  of  operations 
on  rt’ul\rumlam  variables  after  all  the  operations  on  REALs,  I.VTs,  and  ISOOLs  have  been  performed. 
All  loops  have  hecn  unwound,  and  all  subroutine  calls  involving  rcal\random  variables  have  been 
expanded  into  one  large  computation  tree. 

When  the  PRIM  routine  is  invoked  for  a rfaNrcmdom  variable,  that  variable  is  designated  as 
the  output  variable,  and  the  STAT  simplification  and  evaluation  code  is  invoked.  As  descr  bed 
earlier,  a separating  set  of  independent  variables  is  determined  for  the  output  variable. 

The  output  variable  is  then  expressed  symbolically  as  a multinomial  function  of  the  variables 
in  the  separating  set.  This  is  done  hy  backward  substitution  beginning  with  the  output  variable. 
The  output  variable  can  be  expressed  as  some  function,  say  addition,  of  two  other  random  vari- 
ables. These  in  turn  can  be  expressed  in  terms  of  their  fathers.  Finally,  the  expansi  in  will 
reach  members  of  the  separating  set  and  the  substitution  will  terminate. 


If  the  moments  of  the  variables  in  the  separating  set  are  available,  then  the  moments  of 
the  output  variable  may  be  obtained  directly  by  an  evaluation  process  on  the  multinomial  repre- 
sentation. Note  this  is  not  the  normal  evaluation  of  a multinomial;  rathr  r,  as  described  in 
Sec.  \,  we  must  evaluate  a term  X \ ^ as  E(X  )E(Y^).  To  obtain  the  higher  moments  of  the  out- 
put variable,  we  must  symbolically  raise  the  multinomial  to  a power  to  obtain  the  multinomial 
expansion  for  the  higher  moments. 

After  evaluation  of  a particular  output  variable,  all  the  intermediate  results  are  retained 
in  the  computation  tree  data  structure.  The  motivation  for  this  derives  from  the  fact  that  the 
ECL  user  is  operating  interactively  and  may  ask  for  further  operations  invoking  the  variables 
he  has  used.  It  would  clearly  have  been  possible  to  choose  not  to  retain  these  intermediate  re- 
sults, and  to  regenerate  them  as  needed. 

Note  that  the  moments  of  the  input  variables  arc  not  needed  until  after  all  the  symbolic  poly- 
nomial manipulations  are  performed.  Moreover,  once  this  analysis  of  the  program  structure 
has  been  performed,  different  input  distributions  could  be  specified  to  generate  different  output 
distributions  without  changing  the  intermediate  analysis.  ST  AT  has  not  been  organized  to  per- 
mit respecifying  the  input  distributions,  but  again  this  could  clearly  have  been  done.  In  this 
sense,  ST AT  is  a compile  and  execute  system  rather  than  a compiler. 


TABLE  VI-1 
STATIC  BREAKDOWN  OF 

STAT  CODE 

Lines 

of 

Code 

Percent 

ECL  Interfoce  and 

Computation  Tree  Construction 

195 

27. 5 

S impl location  Strategy  ond 

Evaluation  Control 

110 

15.5 

Polynomial  Manipulation  Pockage 

180 

25.4 

Set  Manipulation  Packoge 

40 

5.6 

Constructor  Definitions 

185 

26.  1 

TOTAL 

710 

The  static  bre;  kdown  of  the  code  in  the  STAT  system,  shown  in  Table  \ 1-1,  provides  some 
indication  of  where  the  programming  effort  was  spent.  The  polynomial  manipulating  package 
represents  the  weakest  portion  of  STAT,  since  the  implementation  is  done  using  a list-structured 
representation  for  the  polynomial  and  recursive  calls  to  perform  the  required  operations.  This 
led  to  ease  in  coding,  but  requires  long  execution  times. 


This  is  reflected  in  the  dynamie  code  breakdown  shown  in  Table  VI-2.  These  data  were  ob 
tained  on  the  Harvard  PDP-10  (a  KA-tO  processor,  earliest  of  the  PDP-10  series),  using  the 
ECL  interpreter  <o  exeeute  STAT  when  performing  the  statement 

->Distto60MPH(uniform(98.0,  128.0),  uniformdOOO.O,  2000.0),  5.0)$ 


TABLE  VI-2 

DYNAMIC  BREAKDOWN  OF  STAT  CODE 

CPU  Time* 
(sec) 

Percent 

1.  Computation  Tree  Formation 

1.71 

5 

2.  Sepcroting  Set  Selection 

0.81 

2 

3.  Polynomiol  Formation 

2.20 

6 

4.  Raising  Polynomials  to  Powers 
1 5.  Evaluating  Moments 

25.83 

72 

A.  Terminal  Nodes 

1.58 

4 

B.  Infermediote  Nodes 

3.43 

10 

6.  Miscellaneous 

0.49 

1 

TOTAL 

36.05 

100 

*Prelim inary  dota-individual  values  have  voriance  of  20  per- 
cent about  their  meon. 

There  are  a number  of  interesting  points  to  note  in  Table  VI-2.  Only  about  1 5 pereent  of 
the  work  (items  5A  and  5(1)  depends  on  the  moment  values.  Thus,  if  different  input  distribu- 
tions are  specified,  it  would  be  possible  to  recalculate  the  results  with  only  about  15  percent  of 
the  effort  required  tor  the  first.  The  STAT  system,  as  noted  earlier,  does  not  currently  pro- 
vide this  option. 

The  othe.'  point  of  particular  interest  is  that  the  time  to  perform  the  operation  of  raising 
the  polynomials  to  a power  and  evaluating  the  moments  could  be  easily  estimated  based  on  the 
number  of  terms  in  the  polynomials  formed  bv  substitution.  Thus,  .fter  about  15  pereent  of  the 
effort  had  been  expended  (steps  1,  2,  and  3),  the  time  for  the  remaining  steps  eould  be  predicted, 
and  even  presented  as  a table  depending  upon  t lie  number  of  moments  desired  in  the  output. 

A listing  of  the  STAT  system  and  a more  detailed  description  of  the  eode  may  be  found  in 
the  Appendix. 


VII.  PROBLEMS  IN  PROVIDING  A PRACTICAL  STATISTICAL  COMPILER 


In  the  last  section,  we  described  a statistical  compiler  system  with  a limited  capability.  In 
this  section,  we  emphasize  the  limitations  of  the  ST  AT  system  and  indicate  some  approaches  to 
the  construction  of  a "complete"  statistical  compiler  system. 

The  major  limitation  of  STAT  is  the  small  variety  of  operations  it  is  prepared  to  perform 
on  random  variables.  As  we  discussed  extensively  in  Sec.  V,  there  is  a complex  interaction  be- 
tween the  choice  of  representation  form  and  the  operations  which  are  provided  to  the  STAT  user. 
By  restricting  our  prototype  to  only  a few  operations,  we  were  able  to  choose  a representation 
which  is  convenient,  accurate,  and  efficient.  Extending  to  other  operations  however  will  involve 
compromises  in  some  or  all  of  these  areas. 

I* or  example,  assume  that  we  wished  to  permit  a random  variable  to  appear  in  an  arithmetic 
comparison  with  a real.  This  implies  evaluating  the  cdf  for  the  random  variable  at  the  value  of 
the  real.  As  we  indicated  in  Sec.  V,  this  conversion  is  expensive  and  error-prone  unless  the 
real  value  is  in  the  tails  of  the  distribution. 

Of  course,  if  the  operations  we  wish  to  perform  on  the  random  variables  are  exclusively 
comparison,  and  maximum/minimum  operations,  then  we  should  choose  the  cdf  as  the  repre- 
sentation form.  In  this  way,  we  could  construct  an  alternative  STAT  system  optimized  for  dif- 
ferent operations  and  convenient  for  those. 

We  can  pursue  this  further  and  construct  a number  of  different  statistical  compilers  each 
using  a representation  optimized  for  a particular  set  of  operations.  This  set  of  statistical  com- 
pilers would  each  handle  a subset  of  all  the  problems  one  would  like  a statistical  compiler  to 
handle. 

One  can  proceed  a step  further  in  the  direction  explored  by  Low  [Low  74|.  A master  statis- 
tical compiler  could  be  constructed  which  would  contain  all  the  particular  representation  forms 
we  have  the  knowledge  and  patience  to  implement.  The  performance  characteristics  of  the  par- 
ticular representations  would  be  parametrically  described,  and  the  master  compiler  would  then 
choose  the  representation  form  to  minimize  the  resources  required  at  execution  time. 

hollowing  Low,  there  are  a number  of  possible  ways  to  proceed  in  the  construction  of  such 
a master  compiler.  We  might  insist  that  conversions  between  representations  are  expensive 
and  will  not  be  permitted.  In  this  case,  we  "partition"  the  random  variables  according  to  the 
operations  performed  on  them.  For  each  random  variable,  if  the  set  of  operations  to  be  per- 
formed is  small,  then  there  may  well  be  an  optimum  representation  employed.  In  Low's  study 
for  representing  sets,  this  technique  of  choosing  one  representation  to  use  for  the  life  of  the 
variable  was  satisfactory  and  such  a compiler  was  constructed. 

In  the  statistical  compiler  area,  because  the  costs  of  using  the  wrong  representation  are 
severe,  and  because  no  one  representation  is  satisfactory,  we  may  be  forced  to  converting 
from  one  representation  form  to  another  As  Low  properly  indicates,  such  a compiler  would 
at  this  time  be  a research  effort  in  itself  and  constitutes  an  interesting  problem. 

There  are  several  further  poe«; bilitics  beyond  those  suggested  that  might  he  used  in  a sta- 
tistical compiler.  One  option  of  interest  in  a statistical  compiler  is  the  possibility  of  postpon- 
ing computational  effort  as  long  as  possible  because  the  intermediate  results  can  be  so  large. 
Indeed,  it  may  in  some  cases  be  more  economical  to  recalculate  the  results  as  needed  than  to 
save  the  result  in  storage.  Thus,  one  set  of  representation  options  would  be  to  produce  sub- 
routines which  will  calculate  the  values  of  specific  entries  in  the  representation  form  rather 
than  to  produce  the  representation  form  in  toto. 
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Because  any  specific  set  of  representations,  such  as  those  discussed  in  Sec.  V,  will  not 
be  convenient  for  some  base  functions,  it  is  necessary  to  provide  some  alternate  mechanism 
to  handle  these  operations.  The  obvious  alternative,  first  suggested  in  this  context  by  Berzing 
[ BerzinS  7 5|,  is  to  employ  Monte  Carlo  simulation  for  small  sections  of  the  program,  and  con- 
vert the  various  representation  forms  to  and  from  a sampling  representation.  A master  statis- 
tical compiler  with  this  option  can  guarantee  to  be  able  to  handle  any  "FOR TRAN " -type  program 
presented  to  it  using  random  variables,  although  there  will  surely  be  some  for  which  the  errors 
or  execution  time  will  be  unacceptable. 

The  master  compiler  can  and  should  offer  its  user  an  estimate  of  the  errors  and  execution 
time  of  his  program.  In  the  case  of  STAT,  we  could  easily  have  presented  the  user  with  a fairly 
good  prediction  (±20  percent)  of  the  running  time  of  his  program  after  only  15  percent  of  the  to- 
tal effort  had  been  expended;  we  could  have  gone  further  and  presented  the  user  with  a table  of 
running  times  as  a function  of  the  number  of  moments  desired  in  the  output.  Since  the  execu- 
tion time  may  be  large  for  cases  of  practical  interest,  this  would  permit  the  user  to  carefully 
consider  the  value  of  the  information  and  compare  it  with  the  cost  of  obtaining  this  information. 
The  analysis  to  calculate  the  running  time  can  be  performed  rapidly  enough,  even  for  large  pro- 
grams, to  be  performed  interactively,  while  the  large  execution  time  could  be  deferred  for  a 
batch  environment,  with  the  user  knowing  what  resources  he  will  be  expending. 

A practical  statistical  compiler  must  offer  its  user  a variety  of  output  options  for  their  ran- 
dom variables.  It  must  be  prepared  to  provide  moments  about  the  origin,  moments  about  the 
mean,  cdf,  or  pdf.  This  goes  directly  back  to  the  Sec.  V discussion  on  representation  conver- 
sions and  the  inherent  accuracy  limitations  of  some  of  these  conversions.  The  user  must  be 
warned  of  the  possible  inaccuracies  of  his  result  in  quantitative  terms.  It  is  not  acceptable  in 
an  operational  system  to  be  used  by  many  users  in  diverse  situations  to  indicate  "WARNING: 
CONVERSION  FROM  A MOMENT  REPRESENTATION  TO  A CDF  MAY  PRODUCE  SIGNIFI- 
CANT ERRORS  IN  THE  OUTPUT."  The  system,  if  it  is  to  be  used,  must  indicate  the  location 
of  the  potential  error  and  its  magnitude.  It  must  indicate  what  options  the  user  has  for  avoiding 
the  error  (i.e.,  request  your  output  as  moments,  not  cdf).  The  system  can  and  should  track  er- 
rors generated  ir.  intermediate  operations  and  indicate  the  effects  of  their  propagation.  Again, 
this  prescription  becomes  a significant  research  problem. 

A less  attractive  alternate  is  to  construct  subsets  of  such  a master  compiler  which  can  de- 
tect errors  of  significant  proportions,  and  then  insist  on  using  higher  accuracy’  in  this  subsec- 
tion of  the  program  (i.e.,  more  samples  in  a Monte  Carlo  approach,  higher  moments  in  a mo- 
ment representation,  more  points  in  a cdf,  etc.).  This  approach  can  provide  some  assurance 
of  adequate  error  control  although  not  as  reliably  as  tracking  errors.  It  may  require  large  or 
excessive  amounts  of  computing  to  reduce  errors  which  would  not  propagate  further. 

Another  problem  that  the  designer  of  a master  statistical  compiler  must  consider  is  the 
handling  of  user-written  subroutines.  In  STAT,  this  is  handled  by,  in  effect,  expanding  each 
subroutine  invocation  into  the  in-line  code  which  it  represents,  and  then  processing  the  whole 
program  at  once.  This  is  unacceptably  cumbersome  for  a compiler  intended  to  handle  programs 
ot  significant  size.  A practical  statistical  compiler  must  be  capable  of  "separately  compiling" 
subroutines.  It  should  produce,  in  the  run-time  environment,  a section  of  code  which  repre- 
sents the  subroutine  and  which  is  invoked  on  each  call  of  the  subroutine.  This  representation 
accepts  distributions  for  those  arguments  whose  data  type  is  ramiom,  and  produces  a result  which 
may  be  of  type  random. 
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The  problem  here  is  relatively  simple  if  the  input  variables  are  statistically  independent. 

In  that  ease,  the  problem  is  precisely  the  same  as  compiling  the  main  program,  and  is  thus  just 
the  problem  that  the  compiler  is  designed  to  cope  with.  However,  the  possibility  that  the  input 
variables  may  not  be  independent  eomplieates  the  construction  of  the  subroutine  eode.  Although 
the  techniques  indicated  in  Sees.  IV  and  V are  relevant  to  this  problem,  some  of  the  simplifica- 
tions were  obtained  by  the  assumptions  of  independence  of  the  input  variables.  Relaxing  this 
assumption  makes  simplifying  the  program  structure  more  difficult.  If  all  input  variables  are 
mutually  dependent,  then  no  effective  simplification  of  the  program  can  be  performed.  As  more 
mutually  independent  sets  of  input  variables  are  identified,  more  simplification  can  be  performed. 

Another  problem  in  the  construction  of  a statistical  compiler  is  the  question  of  an  optimum 
computer  architecture  for  the  execution  of  sueh  a compiler.  We  distinguish  this  into  two  phases: 
eompile  time  and  run  time.  In  eompile  time  activities,  the  activities  of  a statistical  compiler 
are  not  significantly  different  from  a regular  compiler.  There  are  some  differences  related  to 
spending  more  time  searching  computation  trees  and  performing  algorithms  on  these  trees.  In 
addition,  a relatively  small  percentage  of  the  total  time  is  spent  in  the  eompile  time  activities 
of  t ie  statistical  compiler.  The  possible  useful  hardware  modifications  are  in  the  areas  of  im- 
proved eharaeter  handling  and  staek  operations,  as  well  as  the  possibility  of  speeial-purpose 
hardware  for  algorithms  on  tree  structures.  The  speeial  tree  hardware  might  be  eapable  of  the 
following  operations: 

(1)  Data  Entry 

x is  father  of  y 
x is  son  of  y 
x is  brother  of  y 

ete. 

where  x is  being  entered  into  the  tree,  and  y is  already  a member. 

(2)  Data  Retrieval 

the  set  of  sons  of  x 

the  set  of  ancestors  of  x 

the  set  of  eommon  ancestors  of  x and  y 

where  x and  y are  elements  of  the  tree. 

(3)  Predicates 

is  x an  ancestor  of  y? 

are  x and  y independent? 

does  the  set  {z}  separate  x from  y? 

where  fz}  is  a set  of  elements  of  the  tree,  and  x and  y are  elements  of  the  tree. 

This  tree  manipulation  hardware  would  be  generally  useful  beyond  just  the  statistical  com- 
piler. I see  two  possible  implementation  approaches  to  such  hardware. 

(1)  Via  a microprocessor  operating  on  a private  memory  with  algorithms 
optimized  for  the  tree  task  using  random  access  memory  and  pointers 
in  the  structure  and, 

2)  Via  an  associative  memory  using  eontent  addressing  to  rapidly  retrieve 
items  with  the  desired  relationships. 
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Option  (1)  has  the  advantage  of  being  fairly  easy  to  construct  today  with  commercu'ily  available 
hardware,  while  option  (21  should  be  faster  but  is  significantly  more  expensive. 

The  architecture  for  support  of  the  run  time  system  is  the  more  important  issue,  and  is 
crucially  dependent  on  the  representation  forms  used.  Each  form  has  some  hardware  structure 
that  we  could  describe  to  optimize  its  execution;  some  of  these  are  commercially  available. 

For  example,  operations  on  pdf's  might  be  performed  using  hardware  convolvers  and  FFT  boxes 
These  devices  are  now  available  commercially  and  will  become  cheaper  over  time.  For  exam- 
ple, convolvers  on  a single  chip  of  CCD  or  MNOS  technology  are  just  now  appearing  in  the  lab- 
oratory [Tiemann  74 J. 

If  moment  representations  are  to  be  employed,  then  hardware  for  symbolic  manipulations 
of  polynomials  might  be  appropriate.  I have  no  specific  insight  into  the  construction  of  such 
hardware,  but  if  available  it  could  be  utilized  for  a statistical  compiler  as  well  as  in  a symbolic 
manipulation  system  [CACM  71 1. 

In  general,  the  analysis  of  programs  for  a statistical  compilation  reveals  sections  which 
can  be  pursued  in  parallel.  This  property  of  the  analysis  can  be  used  to  exploit  hardware  par- 
allelism for  those  programs  which  are  intrinsically  parallel.  These  parallel  sections  follow 
closely  disjoint  sections  of  the  underlying  computation  tree  which  permit  easy  deter  mination  of 
the  necessary  control  structure.  Thus,  the  picture  I have  of  a processor  designed  for  a sta- 
tistical compiler  run  time  system  would  have  the  following  major  features: 

(1)  A set  of  independent  processors,  not  necessarily  identical.  Some  of 
these  would  be  optimized  for  convolution  operations,  some  for  symbolic 
polynomial  manipulation,  some  for  Monte  Carlo  simulations,  etc.  This 
is  probably  best  handled  using  microprocessors  with  writable  micro- 
code so  they  can  be  rapidly  optimized  to  handle  the  particular  represen- 
tation form  for  the  assigned  section  of  the  tree. 

(2)  A master  controller  following  a tree  description  of  the  control  structure 
necessary  for  the  execution  of  the  program.  The  control  structure  would 
be  constructed  at  compile  time,  and  the  controller  would  then  see  to  initi- 
ating and  responding  to  the  terminations  of  processors. 

This  description  leaves  open  the  question  of  memory  organization  to  support  such  a collec- 
tion of  processors,  although  th's  is  likely  to  be  critical  to  the  success  of  a hardware  implemen- 
tation. If  the  processors  are  physically  clustered,  then  a common  multiport,  highly  interleaved 
memory  preserves  the  flexibility  to  alter  memory  allocation  dynamically.  Constructing  such 
a complicated  memory  bus  will  however  surely  occupy  a large  fraction  of  the  implementation 
effort.  The  alternate  approach  of  providing  each  processor  with  private  memory  is  less  attrac- 
tive in  this  case  because  of  the  large  and  variable  demands  upon  the  memory  resources  in  any 
particular  execution. 

This  hardware  organization  would  be  interesting  and  useful  for  a broad  variety  of  tasks, 
and  thus  could  be  made  economically  viable.  The  closest  approach  to  this  structure  to  my 
knowledge  is  the  C.  MMP  effort  at  Carnegie  [Wulf  72  ] which  uses  multiple  PDP-ll's  rather 
than  microprocessors.  The  design,  implementation,  and  software  for  such  a structure  re- 
mains a fascinating  and  challenging  problem. 
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VIII.  REMAINING  RESEARCH  PROBLEMS 


In  this  document,  we  have  presented  a new  philosophy  and  approach  to  computing  with 
random  variables.  The  work  presented  is,  however,  only  a beginning  in  this  area  and  much 
fruitful  work  remains  to  be  performed.  In  this  section,  we  indicate  some  of  the  major  research 
problems  that  remain  to  be  solved. 

A.  OTHER  APPROACHES 

In  Sec.  II,  we  presented  a non-computability  result  indicating  some  fundamental  limitations 
on  the  types  of  computations  which  can  be  performed.  As  we  indicated  immediately  following 
the  theorem,  there  are  a number  of  possible  problems  to  he  solved  within  these  limitations.  In 
the  remainder  of  the  report  we  have  only  explored  one  subproblcm,  but  the  solution  of  some  of 
the  others  would  be  a useful  addition  to  the  state  of  the  art. 


As  indicated  in  Table  VIII-1,  we  can  divide  the  problems  into  several  classes.  One  catego- 
rization describes  the  class  of  functions  the  statistical  compiler  can  handle.  For  this  discussion, 
I divide  the  classes  of  functions  loosely  into  "easy"  classes  and  "hard"  classes,  where  "easy" 
classes  do  not  include  Kleene's  predicate,  while  "hard"  classes  do.  The  clas3  of  computation 
trees  is  considered  an  "easy"  class  in  ttiis  sense.  Another  division  of  statistical  compilers  is 
based  on  whether  the  input  distributions  are  atomic  with  a finite  number  of  atoms,  atomic  with 
an  infinite  number  of  atoms,  or  continuous.  The  simplification  rules  described  in  Sec.  IV  apply 
to  all  these  types  of  distributions,  while  the  representation  techniques  in  Sec.  V emphasize  the 
last  two.  Finally,  a third  division  is  based  on  whether  the  techniques  used  for  the  statistical 
compiler  are  capable  of  providing  exact  results  assuming  the  computing  device  performs  exact 
arithmetic.  The  approach  in  this  work  has  been  exact  solutions,  as  contrasted,  for  example, 
with  Monte  Curio  techniques  which  are  inherently  approximate. 
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The  table  indicates  which  approaches  have  been  pursued  so  far.  As  can  be  seen,  the  table 
has  a major  gap  in  practical  techniques  for  the  "hard"  class  of  functions  and  finite  atomic  dis- 
tributions. The  only  approaches  currently  available  are  exhaustively  evaluating  the  function  for 
all  atoms  of  the  input  distribution  (not  practical)  or  sampling  the  input  distribution  (i.e.,  Monte 
Carlo  technique).  Some  intermediate  approach  which  analyzes  the  structure  of  the  computation 
to  reduce  the  amount  of  work  seems  fruitful.  However,  the  non-computability  results  imply 
that  any  such  techniques  must  critically  depend  on  the  finite  nature  of  the  distributions.  The 
key  nuestion  then  is  whether  such  algorithms  will  be  combinatorial’*  explosive  for  most  practi- 
cal problems  or  useful  for  a large  class  of  functions. 

The  other  problem  indicated  in  the  table  is  the  extensive  dependence  on  Monte  Carlo  and 
statistical  compeer  techniques  for  many  different  problems.  1 believe  that  more  efficient  ap- 
proaches to  the  "easy"  finite  cases  can  be  designed  which  will  permit  more  accurate  results 

than  Monte  Carlo  techniques  but  which  are  less  accurate  than  the  statistical  compiler  techniques 
described  here. 


B.  REPRESENTATION  TECHNIQUES 

The  analysis  of  representation  techniques  in  Sec.  V only  begins  to  approach  an  exceedingly 
complex  topic  which  is  at  the  heart  of  statistical  compilation  techniques.  The  discussion  in 

SeC'  ^ SUggeSts  a lsrge  vari£ty  of  techniques,  but  only  a few  of  these  have  had  significant  utiliza- 
tion. More  extensive  experience  with  these  techniques  is  clearly  necessary  to  evaluate  their 
suitability  or  use. 


TABIE  VI 1 1-2 

Base  Operations 

Representation  Techniques 

+,  - 

Characteristic  Functions 
Sampled  pdf 
Moments 

★ 

Moments 
Mellin  Transform 
Sampled  pdf 

MAX,  MIN,  CHOICE 

cdf 

SUM 

Generating  Functions 

I able  v.iI-2  mdi  ates  a number  of  base  functions  and  some  representations  which  are  par- 
ticularly convenient  for  those  base  functions.  The  table  shows  that  for  the  major  c aerations  a 
number  of  representation  choices  are  available.  However,  the  list  of  base  functions  here  is  rela 
tively  short,  excluding  such  important  operations  as  division  and  exponentiation.  Further,  the 
representation  choices  tend  to  be  specific  to  certain  ba*e  operations.  Thus,  there  are  two  ma- 
jor areas  of  further  effort:  (1)  new  representations  appropriate  to  other  base  operations,  and 
(2)  new  algorithms  for  f-convolution  for  specific  representations  (i.e.,  a convenient  algorithm 
for  + - convolution  on  sampled  pdfs). 
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Further  work  in  the  extension  of  f-convolution  algorithms  to  joint  probability  distributions 
could  significantly  improve  computational  speed  and  accuracy  for  complicated  programs. 

C.  APPLICATION  AREAS 

We  began  this  report  with  two  examples  of  applications  for  statistical  compile"  techniques. 

In  Sec.  VI,  we  showed  the  use  of  the  STAT  system  to  solve  one  of  these  problems.  These  ex- 
amples indicate  some  possible  uses  of  a statistical  compiler,  but  represent  a very  limited  sub- 
set of  the  actual  application  areas. 

In  general,  application  areas  for  a statistical  compiler  of  the  type  envisioned  in  Sec.  VR  ha^e 
a number  of  distinguishing  properties,  or  conversely,  if  a problem  has  these  distinguishing 
properties,  it  is  a candidate  for  use  of  statistical  compiler  techniques.  The  major  character- 
istics of  these  problems  are  (a)  there  is  a large  population  of  interest  (large  enough  for  statis- 
tical measures  to  be  accurate  enough  for  practical  purposes),  (b)  each  element  of  the  population 
has  certain  properties  which  are  expressed  numerically,  (c)  the  distribution  of  the  values  of  the 
properties  is  known  for  the  population,  and  (d)  a deterministic  description  of  the  behavior  (also 
specified  numerically)  of  an  element  of  the  population  in  terms  of  its  properties  is  available. 

Then  one  may  use  statistical  compiler  techniques  to  calculate  the  distribution  of  the  values  of 
the  behavior  of  the  population.  Moreover,  the  problem  may  be  presented  to  the  compiler  in  two 
parts:  the  behavioral  description,  and  the  property  distributions. 

However,  these  very  general  guidelines  do  not  indicate  the  real  limitations  of  the  techniques. 
As  we  have  indicated  elsewhere  in  this  report,  the  two  major  practical  problems  are:  (1)  the 
complexity  of  the  behavior  function,  and  (2)  the  joint  dependencies  of  the  input  distributions. 

How  then  is  a potential  user  to  decide  whether  10  employ  these  techniques.  If  a statistical 
compiler  as  envisioned  in  Sec.  VU  is  available,  the  user  may  ask  the  compiler  for  estimates  of 
running  time  and  accuracy.  Hut  this  is  partially  begging  the  questions,  for  the  real  issue  is: 

For  how  many  useful  problems  would  a statistical  compiler  produce  accurate  results  with  a plau- 
sible expenditure  of  resources'’,  or  expressed  more  succinctly,  Is  it  worth  the  effort  to  construct 
an  extensive  statistical  compiler? 

t'nt'ortunately,  at  this  point,  no  quantitative  data  to  support  an  answer  to  these  Questions 
exist.  It  is  clear  from  the  work  presented  here,  that  there  is  a class  of  problems  for  which 
these  techniques  are  applicable  and  efficient.  There  is  also  clearly  more  research  work  to  be 
performed  before  the  construction  of  a "practical"  statistical  compiler  can  oe  sized  accurately. 

Thus,  the  major  unsolved  question  remaining  in  this  area  is  its  ultimate  practicality.  Some 
specific  subquestions  can  be  phrased  to  help  the  resolution  of  this  question:  How  well  can  we 
characterize  the  execution  time  of  some  of  the  f-convolution  operations?  Are  there  specific  ap- 
plication areas  which  would  be  satisfied  with  just  the  base  operations  for  which  convenient  rep- 
resentations are  known?  llow  complicated  a process  is  the  rule-guided  optimization?  How  much 
storage  must  be  dedicated  to  store  joint  distributions  for  practical  problems-’ 

The  answers  to  these  questions  will  significantly  aid  in  determining  the  future  of  this  line 
of  inquiry.  My  belief,  at  this  point,  is  that  these  questions  will  take  time  to  answer,  but  that 
the  ultimate  utility  of  statistical  compiler  techniques  will  be  demonstrated. 


APPENDIX 

DESCRIPTION  OF  STAT  CODE 


The  STAT  system  is  constructed  entirely  in  ECL,  using  the  ECl,  mechanisms  for  mode 
extension  [VVegbreit  74].  These  mechanisms  have  been  general  enough  to  allow  for  all  the 
extensions  described  in  See.  VI  without  any  changes  to  the  ECL  system,  although  certain 
capabilities  to  properly  control  the  construction  of  reulVmulom  objects  have  only  appeared  in  the 
system  recently.  The  files  which  the  user  "LOAD"s  to  invoke  STAT  consist  of  several  primary 
components: 

(1)  A set  of  functions  required  by  the  ECL  system  to  specify  the  behavior 
of  the  new  mode  reatyamlom.  These  include  funetions  for  generation, 
conversion,  assignment,  selection,  and  printing  of  it  a I\  random  objects. 

(2)  Extensions  to  the  operators  " + ",  "/",  and  to  handle 

rt’illyamJom  variables.  In  the  standard  eases,  the  already  existing 
system  routine  is  invoked,  while  real\rmidom  objects  trigger  entry  to 
code  written  for  the  STAT  system. 

(3)  Definitions  of  the  constructors  point,  distribution,  uniform,  gaussiim,  and 
po is son, 

(4)  Definitions  for  a new  mode  called  poly,  used  to  represent  polynomials 
of  several  variables,  which  is  used  in  the  evaluation  of  rcal\rundom 
expressions  as  described  below,  and 

(5)  Routines  which  control  the  amount  of  evaluation  effort  expended  at 
each  point  during  the  execution  of  a program  involving  rcaKraitdom 
objects. 

The  data  structure  used  to  represent  rcalVaudom  objects  has  a number  of  fields: 

uumc  SVMI10L 

A guaranteed  unique  name  which  is  used  internally  to  refer  to  this 
random  variable. 

leflfather  PTR  f REAL,  I\1 , rculVuudom) 

nghtfalhcr 

These  fields  point  to  the  values  that  were  combined  to  produce  the 
random  variable.  They  may  point  to  another  random  variable,  or  a 
REAL  or  IXT  value.  Only  two  such  fields  are  necessary  because  only 
operators  on  one  or  two  variables  have  been  provided,  although  the 
extension  to  handling  functions  of  several  variables  merely  requires 
additional  father  fields. 

fu  ROUTINE 

The  operator  used  to  combine  the  fathers  to  obtain  this  random  variaole. 

mom  VECTORf  16,  REAL) 
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Space  to  hold  the  moments  about  the  origin  of  this  ram  \i  variable. 

Initially,  none  ol'  these  values  are  calculated-,  rather  the^  are  generated 
as  needed. 

airlflt  I NT 

The  index  of  the  last  entry  in  the  mow  table  which  has  been  filled. 

morngoi  PROC(  PTU(  rr<il\rmi<lom) ) 

A routine  which  calculates  additional  moments  as  necessary.  For 
random  variables  which  are  generated  by  constructors,  specific  momgoi 
functions  are  provided,  e.g.,  poissomnomgfM,  flflnssimmiompoi,  etc.  For  all 
intermediate  results,  a routine  called  iNlrrmomijm  is  used. 

dcstth  l\T 

The  index  of  the  last  entry  in  the  mom  table  which  should  be  filled  when 
the  momgrn  routine  is  executed. 

data  RET 

Some  of  the  momgrn  routines  associated  with  specific  constructors  re- 
quire space  to  store  their  internal  variables.  The  data  field  refers  to 
a data  structure  for  this  purpose.  The  exact  details  of  this  data  struc- 
ture vary  for  different  constructors. 

UI1C  IMK(rrSET) 

A pointer  to  the  set  of  rr«l\mmlom  objects  which  contains  all  the  compu- 
tational ancestors  of  this  random  variable.  Not  calculated  until  needed. 

When  a constructor  is  invoked,  it  invokes  the  generation  routine  for  reuKramlom  variables. 

The  generation  routine  provides  a blank  data  structure,  with  only  the  name  field  initialized.  The 
constructor  assigns  appropriate  values  to  the  data  and  itumuioi  fields  and  returns. 

Operators  also  invoke  the  generation  routine  to  obtain  a blank  structure,  and  using  a routine 
calleu  rrapply  assign  values  to  the  fields  mamgen,  h'Jlfutlirr,  and  nglilfallirr.  No  other  calculations 
are  performed  at  this  time. 

These  steps  cause  the  system  to  construct  the  computation  tree  for  the  computation  as  the 
user  is  assigning  values  to  variables  and  executing  his  routines,  whether  interactively  or  from 
predefined  functions.  Until  the  prim  function  is  invoked  on  a rt«l\r«ridom,  no  attempt  is  made  to 
evaiuate  its  moments. 

The  prim  routine  is  also  quite  simple.  When  invoked  on  a rc(i!\rmi<lcmi,  it  checks  c'lirllli  to  see 
if  four  moments  are  available.  If  they  are  not,  the  prinl  routine  sets  dcslth  to  lour,  and  invokes 
the  momgrti  routine  to  calculate  the  additional  moments.  When  this  is  complete,  four  moments 
arc  available  anil  may  he  printed. 

The  heart  of  the  system  is  then  the  computational  routines  invoked  via  the  moment  generating 
lunetions,  as  well  as  the  control  which  decides  which  moments  need  to  be  explicitly  calculated. 

We  describe  first  the  algorithms  used  for  some  of  the  constructor  momgni  routines  to  indicate 
the  algorithms  for  these,  and  then  proceed  to  describe  the  more  interesting  routines  invoked  bv 
imcrmomgni  for  non-terminal  node  « of  the  computation  tree. 
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Uniform  Moment  Generator 


ii 


The  moments  about  the  origin  of  a uniform  distribution  from  a to  b may  be  directly 
obtained  by  integration. 


n 


r 


(b  - a) 


dx 


b(n+1)  _ a(n  + 1) 
(n  + 1)  (b  — a) 


If  a speeifie  p^  is  desired,  this  formula  may  be  evaluated  directly,  but  if  all  values  of 
from  n = 1 to  k are  desired,  then  ti  j following  recursive  scheme  is  more  effieient.  Define 


f(n,  a,  b)  = 


b<n+l)  _ a<ntl) 


b - a 

Then  we  have  the  following  recursive  definition  of  f, 
f(0,  a,  b)  = 1 


as  may  be  readily  verified  by  induction. 


fin  + 1,  a,  b)  = a * f(n,  a,  b)  + b<n+1* 

Kn  = fin,  a,  b)/(n  + 1) 

Thus,  if  we  maintain  the  values  of  f(n,  a,  b),  and  b , we  may  calculate  each  higher  moment 
for  the  cost  of  two  multiplications,  one  division,  and  two  additions  as  well  as  the  cost  of  updating 
the  values  for  f and  bn.  In  addition,  two  locations  for  the  storage  of  f and  bn  are  required. 
These  are  provided  in  the  data  portion  of  the  rcalVandom  data  structure. 


Gaussian  Moment  Generator 


The  moments  about  th  : mean  for  the  Gaussian  are  easier  to  obtain  in  closed  form  than  the 
moments  abot  , the  orig:->  We  have  then  the  following  equation: 


2 /„_2 


Pn  i <x-m)ne-(x-m> 

**  -.on 

Let  y = x - m to  obtain 


dx 


2,_2 

n -y  /Zu 

y b J ' dy 


We  may  integrate  this  bv  parts  to  obtain 

r y"  e-yVz* 

• * — co 


2 /.,  2 n+1  /Za^ 


dy 


= 


n + 1 


f 


'«»  m2 

y 


(n  + 1)  (7 


2,_  2 

2 o_y  /Z°  dy 


2/2  2 

Sinee  lim  y e ^ ' a - 0,  we  have  the  following  recursive  relation: 

y->±oo 


r 


yn+2e-y2/2a2dy  =(n  + 1)a2  r yn.-y> 

^ _ on 


2,-2 
„ Zo  , 
e dy 
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or  expressed  in  a more  usable  form: 
i^o  = 1<  = 0 

Hn+2  = (n  + 1)  * c2  * % 

This  implies,  as  one  would  expect,  that  all  odd  moments  about  the  mean  are  zero. 

In  order  to  obtain  moments  about  the  origin,  we  make  use  of  the  following  relationship  be- 
tween moments  about  the  mean  pn  and  moments  about  the  origin  ^ [Kendall  63J. 


We  may  rewrite  this  to  obtain 

^n-ni:  (n)^i)"‘J 

j=0  J 

expressing  the  nth  moment  about  the  origin  in  terms  of  the  nth  central  moment  and  other  moments 
about  the  origin.  The  polynomial  in  (-p^)  is  best  evaluated  by  Horner's  rule,  calculating  the 
combinatorial  coefficients  in  the  same  loop. 

Moment  Calculation  for  Derived  liesults 

As  described  in  Sec.  V,  we  can  evaluate  the  moments  of  a multinomial  function  of  independent 
random  variables  directly  from  the  moments  of  those  independent  random  variables.  It  is  this 
technique  which  is  used  to  implement  the  evaluation  of  moments  for  derived  values. 

In  order  to  evaluate  the  moments  of  a rcal\random  variable  X which  is  not  a direct  result  of 
a constructor,  we  then  proeeed  as  follows: 

(1)  find  a set  (referred  to  as  an  lid  set  for  reasons  described  later) 
of  independent  random  variables  which  separates  (see  separating 
set  definition  in  Sec.  IV)  X from  its  terminal  ancestors  in  the  com- 
putation tree  (i.e.,  from  all  the  eonstructor-generated  random 
variables  which  are  computational  ancestors  of  X). 

(2)  Express  X as  a multinomial  function  of  the  random  variables  in 
the  lid  set. 

(3)  Evaluate  (perhaps  by  recursive  calls)  the  moments  of  the  random 
variables  in  the  I in  set. 

(4)  Use  these  moments  to  perform  the  evaluation  of  the  moments  for  X. 

In  order  to  reduce  the  computational  effort  as  much  as  possible,  both  in  finding  a multi- 
nomial representation  and  in  evaluating  thr  multinomial,  it  is  desirable  that  the  size  of  the  lia 
set  be  as  small  as  possible.  Further,  the  set  must  consist  of  independent  ancestors  of  X. 

Because  of  this  we  refer  to  this  set  as  the  Jeast .independent  ancestor  Ilia  ) set.  There  is  not 
a unique  least  ancestor  set  for  any  arbitrary  random  variable  X in  any  computation  tree. 

Moreover,  generating  a set  which  is  guaranteed  to  be  minimal  is  not  a simple  computation. 

We  choose  therefore  to  calculate  a set  which  is  guaranteed  to  be  independent  and  a separating 


65 


set,  which  insures  the  accuracy  of  the  computation,  but  we  cannot  guarantee  that  it  is  minimal. 

T ius  there  is  a routine  in  the  system  called  lia  which  when  applied  to  a random  variable  calcu- 
lates an  approximation  to  a least  independent  ancestor  set,  and  it  is  this  set  we  refer  to  as  the 
lia  set. 

Once  the  lia  set  is  determined,  the  routine  silbsl  is  invoked.  This  routine  calculates  the 
multinomial  function  representation  for  the  random  variable  X in  terms  of  members  of  the  lia 
set.  Subst  first  calculates  the  multinomial  representation  of  X.leftfalher  and  X.rightfather  in  terms 
of  thelia  set  of  X.  It  then  uses  the  symbolic  polynomial  manipulation  package  which  is  part  of 
SlfAT  to  combine  these  two  according  to  the  appropriate  operation  in  X.fn.  In  fact,  since  the 
routines  which  implement  " " + "*  ",  "/",  and"-'',  also  accept  arguments  of  mode  poly, 

they  are  invoked  directly  to  perform  the  polynomial  manipulations. 

When  the  substitution  phase  is  complete,  each  term  in  the  polynomial  is  examineu  to  deter- 
mine the  highest  degree  for  each  lia  set  member  in  the  multinomial.  For  example,  if  the  derived 
multinomial  for  X is  X = 3 * Z + 2 * -t  4 * Z^  * Y^,  then  the  highest  degree  for  Z in  this 
multinomial  is  2,  and  the  highest  degree  for  Y is  3.  If  the  number  of  moments  desired  for  X 
is  k (the  value  in  X.dfsllll  ),  then  2 * k moments  for  Z are  required,  and  3 * k moments  for  Y 
are  needed.  If  this  number  of  moments  for  these  variables  are  not  available,  then  their  moment 
generating  functions  are  invoked,  perhaps  recursively,  to  obtain  them.  Once  all  the  needed 
moments  are  available,  the  multinomial  may  be  evaluated  to  obtain  the  first  moment  of  X.  As 
described  in  Sec.  V,  moments  of  order  k may  be  obtained  by  symbolically  raising  the  multi- 
nomial function  to  the  power  k,  and  then  evaluating  the  multinomial  obtained.  The  evaluation 
rules  for  multinomials  in  this  context  are  not  the  normal  rules  of  substitution  since  no  specific 
values  exist  for  the  random  variables  in  the  lia  set.  Rather,  when  a term  involving  Zk  is  evalu- 
ated, the  kth  moment  of  Z is  substituted  for  Zk. 

By  far,  the  bulk  of  the  computational  effort  expended  occurs  during  the  symbolic  evaluation 
of  the  powers  of  the  multinomial  representation.  Tne  amount  of  effort  required  could  be  easily 
estimated  on  the  basis  of  the  number  of  terms  in  tne  multinomial  and  used  to  allow  the  user  of 
the  system  to  decide  on  the  number  of  moments  he  requires  when  presented  with  the  cost  to 
compute  each  moment.  No  such  provision  is  currently  provided,  however. 

The  remainder  of  the  system  provides  more  conventional  capabilities  such  as  the  polynomial 
and  set  manipulation  packages  used  to  implement  the  moment  generating  routines.  These  will 
not  be  further  discussed  here. 

A listing  of  the  code  for  the  STAT  system  follows. 
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poly  <-  poly  ::  SE0( STRUCK  coeff : REAL , var : PTR( "term" ) ) ) ; 
term  <-  term  : : SEQ( STRUCT( var : PTR ( "rrSTRUCT " ) , exp:IN'D); 
I N F I X ( " ! < " , 150; 


< <- 

EXPR(x:ONEOF(REAL,  INT , term),  y :ONEOF ( REAL  . INT,  term);  BOOL) 
BEGIN 

CASE(COVERS)[MD(x) . MD( y ) ] 

[ARITH,  AR1TH]  = > x !<  y; 

[term,  term]  => 

BEGIN 

DECL  lx : INT  BYVAL  LENGTH ( x ) ; 

DECL  1 y : I NT  BYVAL  LENGTH! y ) ; 

FOR  1 FROM  1 TO  LENGTH ( y ) 

REPEAT 

; > lx  =>  TRUE; 
x L i ] .var  I)  v[  i J .var  ;> 

VAL( x[ i J .var . name .TLb ) '< 

.’AL(  y [ i J . var  . name  .TL r ) ; 
xtij.exp  * y [ 1 ] .exp  :>  y.tij.exp  !<  y [ i ] exp; 

FALSE; 

END; 

END; 

TRUE  z>  BREAK! TYPE  ERROR  IN  <'1; 
c.ND; 

END; 

rrSTRUCT  <- 
rrSTRUCT  :: 

STRUCT! name :SYMBOL, 

leftfather:PTR(REAL,  INT,  "rrSTRUCT"), 

rlghtfather: PTR! HEAL,  INT,  "rrSTRUCT"), 

momgen : PROC! PTR( "rrSTRUCT" ) SHARED  1 , 

fn : ROUTINE  , 

curl t h : I NT , 

deslth : INT , 

mom : VECTOR ( 16,  REAL) , 

anc : FT  R ( "rrSFT " ) , 

data : REF ) ; 
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t:realSrandom;  realSrandom) 


rafn  <- 

EXPR ( s : real \ random , 

BEGIN 

BECL  lowers: realN random. UR  SHARED  LOWER(s); 

DECL  lowert : realN random . UR  SHARED  LOWER ( t ) ; 

lowers . leftfather  <-  lowert; 

lowers. fn  <-  iden; 

lower s . momgen  <-  intermomgen; 

s; 

END ; 

iden  <-  EXPR( x : poly;  poly)  x; 
rcfn  <- 

EXPR(s : real Srandom , t :MGBE ; ANY ) 

BEGIN 

t = NONE  =>  NOTHING; 

COVERS ( t , PTR ( rrSTHUCT ) ) OR  COVERSit,  REF)  z>  LOWER(s) 
BREAK( 'CONVERSION  FROM  realSrandom  — TYPE  ')• 

END; 

rsfn  <- 

EXPR(s: realSrandom,  a :ONEOF { INT , SYMBOL):  ANY) 

[)  BREAK 'SELECTION  ON  realSrandom  •)  (]; 

rpfn  <- 

EXFR(s: reals  random,  p:P0RT;  realSrandom) 
tEGIN 

DECL  mom : VECTOR( 16 , REAL)  SHARED  LOWER ( s ). mom ; 
eval(s,  4); 

DECL  v a r : R f A L BYVAL  mom[2]  - nom[1]  “ 2; 

PRI NT ( ' mean  = ' , p)  ; 

PRINT(mom[1],  p); 

PRINT ( ' var  , p)  ; 

PRINK  var  , p) ; 
var  <-  ABS  var; 
var  It  0.0  -> 

BEGIN 

DECL  alpha ? : REAL  BYVAL 

(mom[3]  - 3 * mom[1]  * mom[2]  + 2 * mom[1]  " 3)  / 
var  " 1.5; 

DECL  beta2 : REAL  BYVAL 

( mom [ 4 ] - 4 » mom[1]  * mom[3]  + 

6 * mom[  ! ] 2 * mom[2]  - 3 * mom[1]  ~ 4)  / 

var  " 2; 

PRINT(*  skewness  p); 

PRINT ( al pha3 , p ) ; 

PRINT('  kurtosis  p); 

PRINK  beta2 , p); 

END; 

s; 

END; 
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uniguenameotr  <-  1 


uniquename  <- 
EXPH(;  SYMBOL) 

BEGIN 

DEC L s .’SYMBOL  BYVAL 

HASH ( SCONCAT (' name  ' , BASIC\STR(uniquenamectr  <+  1))); 
s.TLB  <-  ALLOC(INT  BYVAL  uniquenamectr ) ; 
s; 

END; 


rgfn  <- 

EXPR(m;MODE  , s : SYMBOL , 1 : AMY ; ON EOF ( real \ random ) ) 

BEGIN 

n I)  real\random  = > BREAK( ' GENERATION  ERROR  - realXrandori:' 
s = "BYVAL"  =>  point(l) ; 
s = "LIKE"  OR  s = "SHARED"  r> 

BEGIN 

CASE(COVERS)[MD(l)] 

[ real\random]  =>  1; 

[ PTR( rrSTRUCT  ) ] = > LIFT(1,  real\random ) ; 

[REF]  hD(VAL(l))  = rrSTPUCT  => 

BEGIN 

DECL  z : PTR ( rrSTRUCT ) ; 
z <-  1 ; 

LIFT(z,  realXrordom) ; 

END; 

TRUE  z>  BREAK( ’CONVERSION  TO  realXrandom  '); 

END ; 

END; 

DECL  x:real\random.UH  BYVAL  ALLOC ( rrSTRUCT ) ; 
x.name  <-  uniquename( ) ; 
s s "SICE"  =>  LIET(x,  realXrandom); 

BREAK( 'GENERATION  ERROR'); 

END; 


realXrandom  <- 
<*  "realXranaom"  , 

UCF(rcfn)  , 

UAF  ( raf'n  ) , 

USF ( rsfn  ) , 

UPF( rpfn ) , 

UGF ( rgfn  ) , 

SU PUG F( TRUE  ) *>  ::  PTR ( rrSTRUCT ) ; 


rrapply  <- 

EXP  R ( f :SYMBOL , 

x :ONEOF ( realNrandom , INT,  REAL), 
y :ONEOF(realNrandom,  INT,  REAL ,’ NONE) ; 
real\random) 

BEGIN 

I'ECL  7 : realN random ; 

DECL  lowerz:rea IN  random. UR  SHAREr  LOWER(z)- 

lowerz.fn  <-  VAL(f.TLB); 

lowerz. momgen  <-  intermomgen ; 

lowerz. leftfather  <- 

faEGIN 

MD(x)  = realNrandom  :>  LOWER(x)- 
ALLOC ( HD( x ) BVVAL  x): 

END; 

lowerz. ripht father  <- 
BEGIN 

MD(y)  = real\ random  = > LOWER ( y ) , 

ALLOC ( MD( y ) BYVAL  y) ; 

END; 

z; 

END; 

INFIX! " ! “ " , 225  , TRUE); 

FLUSH! ') ; 

INFIX!”'",  ?T5 , TRUE); 

!'  <-  EXPONENTIATION; 


EXPR ( x : ONhOF ( INT  , REAL,  rea 1 \ random  polv) 
y : A R I T H ; 

ONbOF ( INT  , RFAL,  realNrandom,  nolyl) 

BEGIN 

CASE(COVEhS)[MC(x),  ND( y ) ] 
tARITH,  ARITH]  r>  x •'  y; 

[poly,  INT] 

BEGIN 

U-CL  p : PI  f,  (polv)  BYVAL  ALL0C!do1v  SIZE  1); 

p[ 1 J .coef f <-1.0; 

t)[1].var  <-  ALLOC  (term  SIZE  0); 

FOR  i FROM  1 TO  y 

REPEAT  p <-  ALLOCtpoly  LIKE  x * VAL(n))  END 
VAL(p) ; 

END; 

[realNrandom,  INT]  =>  rracplv!"'"  x v)- 

T R 1 1 1 \ U D L A L < I " ~ v n r:-  • \ ’ - * 


INFIX( "!/»,  200,  TRUE); 
!/  <-  QUOTIENT; 


/ <- 

EXPR ( x : ONEOF ( INT  , REAL,  real \ random , poly), 
y : ARITH  ; 

ONEOF(INT,  REAL,  realXrandom,  poly)) 
BEGIN 

CASE(COVERS)[MD(x)] 

[realXrandom],  [poly]  =>  x * (1.0  / y); 
TRUE  =>  x ! / y; 

END; 

END; 

INFIXP'!*",  200,  TRUE); 

! * <-  PRODl'CT; 


EXPR ( x : ONEOF ( INT  , REAL,  rsal X random , Doly,  term), 
y:0NEOF(INT,  HEAL,  realXrandom,  poly,  term); 
0NE0F(INT,  REAL,  realXrandom,  Doly,  term)) 

BEGIN 

DECL  mx:M0DE  BYVAL  MD(x); 

DECL  my :MODE  BYVAL  KD( y ) ; 

CO VERS ( ARITH,  mx  ) AND  COVERSp  RITH , my)  =>  x !*  y; 
COVERS ( poly , mx)  AND  COVERS ( ARITH , my)  :> 

BEGIN 

y = 0 OR  y : 0.0  =>  CONST (poly  SIZE  0); 

BEGIN 

DECL  z : poly  BYVAL  x; 

FOR  i FROM  1 T(J  LENGTH(z) 

REPEAT  z[i] .coeff  <-  z[i].creff  * y END; 

^ » 

END; 

END; 

COVERS ( ARITH , mx ) AND  COVERS(poly,  my)  =>  y « x; 

COVERS ( rea IX random , mx)  AND  COVERStpoly,  my)  OR 
COVERS(poly,  mx)  AND  COVERS(real\random,  my)  OR 
COVERS(term,  mx ) AND  COVERS(realXrandom,  my)  OR 
COVERS(real\random,  mx)  AND  COVERS(term,  my)  OR 
COVERS ( term  , mx)  AND  C0VERS( ARITH , my)  OR 
CO VERS ( AR ITH  , mx  ) AND  COVERS(tern,  my)  i> 

BREAKCTYFti  ERROR  «'); 

CO VERS ( term , mx  ) AND  COVERS(poly,  my)  => 

BEGIN 

DECL  z : poly  BYVAL  y; 

FOR  i FROM  1 TO  LENGTH ( y ) 

REPEAT 

z[ i ] . var  <-  ALLOC(term  SHARED  V AL ( y [ i ] . var ) « x); 
END; 

z; 

END; 


COVEHS( poly , mx ) AND  COVERS( term , my)  r>  y • x- 
COVERS( term , mx)  AND  COVERS(term,  my)  = > 

BEGIN 

DECL  lx : INT  BYVAL  LENGTH(x) ; 

DECL  ly : INT  BYVAL  LENGTH ( y ) ; 

DECL  z : term  SIZE  lx  + ly; 

DECL  ix : INT  BYVAL  1; 

DECL  iy : INT  BYVAL  1; 

DECL  j : INT  BYVAL  0; 

REPEAT 

DECL  copy : ROUTINE  BYVAL 

EXPR ( x : term  SHARED,  ix:INT  SHARED,  lx:INT  SHARED) 
REPEAT 

ix  > lx  =>  NIL; 

j <+  i; 

z[ j ] <-  x[ ix] ; 
ix  <+  1 ; 

E;,L, 

ix  > lx  = > copy ( y , iy,  ly) ; 
iy  > ly  :>  copy ( x , ix,  lx); 

BEGIN 

j <+  i; 

VAL ( x[ ix ]. var . name  .TLB)  = VAL( y[ iy ]. var . name . TLB ) => 
BEGIN 

z[j]  <-  x [ ix  ] ; 

z[  j ] . exp  *■  y[  iy]  .exp; 

ix  <+  1 ; 

iy  <+  i; 

END; 

VAL(x[ix] .var. name. TLB)  > VAL( y[ iy ]. var . name .TLB)  => 
[)  z[ j]  <-  x[ix] ; ix  <+  1 (]; 
z[j]  <-  y[ iy  ] ; 
iy  <+  i; 

END; 

END; 

BEGIN 

DECL  oz : FTR( term  ) BYVAL  ALLUC(term  SIZE  ]); 

FOR  i FROM  1 TO  j REPEAT  ozfij  <-  z[i]  END; 

VAL( oz)  ; 

END; 

END; 

COVERS ( poly , mx)  AND  COVERS(polv,  my)  => 

BEGIN 

DECL  z : PTR ( pol y ) BYVAL  ALL0C(poly  SIZE  0); 

LENGTH ( y ) < LENGTH(x)  :>  y * X; 

FOR  i FROM  1 TO  LENGTH ( x ) 

REPEAT 
z <- 

ALLOC (poly  LIKE 

VAL(z)  + x[i] .coeff  * ( VAL( x[ i ] . var ) * y)); 

END ; 

VAL(z) ; 

END; 

TRUE  = > rrapply("»",  x,  y); 

END; 


i 


INFIX( " I , 175 
PREFIXC " I ; 

! - <-  DIFF; 


TRUE)  ; 


EXPR ( x :ONEOF ( INT  , REAL,  realNrandom,  poly), 

y:ONEOF(INT,  REAL,  realNrandom,  poly,  NONE); 
ONEOF(REAL,  INT,  realNrandom,  poly)) 

BEGIN 

CASE ( COVERS )[ MD( x ) , MD( y ) ] 

[ ARITH , ARITH  ] =>  x !-  y; 

[ARITH,  NONE]  =>  I-  x; 

[poly,  NONE]  => 

BEGIN 

DECL  z:poIy  BYVAL  x; 

FOR  j FROM  1 TO  LENGTH ( z ) 

REPEAT  z [ j ] .coeff  <-  !-  (r[ j ] .coeff ) END; 

z; 

END; 

[realNrandom,  NONE]  =>  rrapply( x); 

TRUE  =>  x + - y; 

END; 

END; 

INFIXC'U”,  175,  TRUE;; 

! + <-  SUM; 


+ 

EXPfi(x:ONEOF(REAL,  IKT , realNrandom,  poly), 
y :ONEOF( REAL  , IN'",  realNrandom,  poly); 
ONEOFCREAL,  INT  realNrandom,  poly)) 

BEGIN 

CASE(COVERS)[MD(x) , MD( y ) ] 

[ARITH,  ARITH]  =>  x 1+  y; 

[poly,  ARITH]  => 

BEGIN 

LENGTH ( VAL( x[ 1 ] . var ) ) = 0 => 

[)  DECL  z:poly  BYVAL  x;  z[1], coeff  <+  y;  z 
DECL  z:poly  SIZE  LENGTH(x)  + 1; 
z [ 1 ] . coeff  <-  y ; 
z[ 1 ] .var  <-  ALLOC ( term  SIZE  0); 

FOR  i FROM  1 TO  LENGTH ( x ) 

REPEAT  z [ 1 + 1 ] <-  x[i]  END; 

z; 

END; 

[ARITH,  poly]  =>  y «■  x; 

[poly,  poly]  => 

BEGIN 

DECL  lx  : INT  BYVAL  LF.NGTH  ( x ) ; 

DECL  1 y ; INT  BYVAL  LENGTH(y) ; 

DECL  z:poly  SIZE  lx  + ly; 
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DECL  ix : INT  BYVAL  1; 

DECL  iy : INT  BYVAL  1; 

DECL  j : INT  BYVAL  0; 

REPEAT 

DECL  copyrROUTINE  BYVAL 

EXPR( x : poly  SHARED,  ix:INT  SHARED,  lx:INT  SHARED) 
REPEAT 

ix  > lx  =>  NIL; 
j <+  i; 

z[j]  <-  x [ ix  ] ; 
ix  <+  1; 

END; 

ix  > lx  =>  copy ( y , iy  ly ) ; 
iy  > ly  =>  copy ( x , ix,  lx); 

BEGIN 


j <+  i; 

VAL ( x [ ix ] . var ) = VAL(y  iyj.var)  => 
BEGIN 

z[  j ] <-  x[ ix] ; 
z[ j] .coeff  <+  y[i y ] .coefi ; 
ix  <+  1 ; 
iy  <+  1; 

z[jj. coeff  = 0.0  =>  j <_  I _ i • 
END; 

VAL(x[ix] .var)  < VAL(y[iy] .var)  :> 

[ ) z[ j]  <-  x [ ix ] ; ix  <+  1 ( ] ; 

ztjJ  <-  y E iy  ] ; 

iy  <♦  1; 

END; 

END; 

BEGIN 


DECL  oz : PTR ( poly)  BYVAL  ALL0C(poly  SIZE  1): 
FOR  i FROM  1 TO-j  REPEAT  oz[i]  <-  z[i]  END; 
VAL(oz); 

END; 

END; 


[poly,  realNrandora]  , [realNrandom,  Doly]  r> 
BREAK(’type  error  +'); 

TRUE  =>  rrapply(”+”,  x,  y); 

END; 


END; 
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eval  <- 

EXPH(u : PTR  ( rrSTRUCT ) , k:INT) 

[)  u.deslth  <-  k;  u.deslth  > u.curlth  O u.momgen(u)  (], 
rrSET  <-  rrSET  ::  STRUCT (members :SEQ( BE~' pt rs : FI h ( "rr MEM " ) ) ; 
rrMEM  <-  rrMEM  ::  STRUCT ( elem : PTR ( rrSIRU.  ') , next:PTR( "rrMEM") ) 
INFIX ( "element " , 150); 


insert  <- 

EXPRU-PTR^rSTRUCT),  y:rrSET  SHARED;  rrSET) 

□hCi  I W 


x element  y z>  y; 

VAL(x. name. TLB)  > LENGTH(y .members) 
y. members[VAL(x. name. TLB) J <-  TRUE- 
y.ptrs  <-  ALLOC(rrMEM  OF  x,  v.ptrs) 

y; 

END; 


= > BREAK(  ’ INSERTION’ 


element  <- 

EXPRCxtPTR^rSTRoCT),  y:rrSET;  BOOL) 
dLUIN 

VAL(x. name. TLB)  > LENGTH ( y . members ) 
y.members[VAL( x name  .TLB ) I ; 

cnihi  J T 


=>  FALSE; 


intermomgen  <- 

EXPR  ( u ; PTR ( rrSTRUCT  ) SHARED) 

BEGIN 

CECL  y : rrSET  LIKE  lia(u) ; 

DECL  sym : poly  LIKE  substtu,  y); 

Oalcdeslth( sym , LOWER ( u ) . deslth) ; 

DECL  x:PTH(rrMEM)  BYVAL  y.ptrs; 

REPEAT 

x = NIL  r>  ML; 

x'<-ex!n»xt-h  > x’elern’curlth  _>  v -elem. momgen(x. elem) ; 
END; 

evalpoly( u , sym ) ; 

END; 
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Ct  M 


<- 

XPR ( x : PTR ( rrSTRUCT ) ; rrSET) 

BEGIN 

x . leftfather  = NIL  AND  x.r  .ghtfather  = NIL  => 

[)  findterm(x);  VAL(x.anc)  (]; 

MD(VAL(x. leftfather))  It  rr.TRUCT  => 

BEGIN 

DECL  y:rrSET  SIZE  VAL( x . right  father . name , TLB) ; 
insert(x . rightfather , y); 

END; 

MD(VAL(x.  rightfather))  It  rrSTRUCT  z> 

BEGIN 

DECL  y : rrSET  SIZE  7AL ( x . leftfather . name .TLB ) ; 
insert ( x . leftfather , y); 

END; 

findterm( x ) ; 

disjoint ( x . leftfather . anc  , x . rightfather . anc ) => 

BEGIN 

DECL  y : rrSET  SIZE  VAL( x . name .TLB) ; 
insert ( x . leftfather , y); 
insert ( x . rightfather , y); 

END; 

VAL( x . anc  ) ; 

END; 

findterm  <- 
EXPR ( x : PTR ( rrSTRUCT ) ) 
bLGIN 

x.anc  It  NIL  =>  NIL; 

x. leftfather  = NIL  AND  x . rightfather  = NIL  r> 

BEGIN 

x.anc  <-  ALLOC ( rrSET  SIZE  VAL( x . name . TLB )) ; 
insert ( x , VAL(x.anc)); 

EK'D; 

MD(VAL(x. rightfather))  It  rrSTRUCT  => 

[)  findterm(  x.  leftfather  ) ; x.ar.c  <-  x . leftfather  . anc  (]; 
findterm (x. right  father); 

MD(VAL(x.  leftfather))  It  rrSTRUCT  => 
x.anc  <-  x . rightfather .anc  ; 
find  term ( x . lc ft fat her) ; 

x.anc  <-  union(x. leftfather. anc,  x . rightfather . anc ) ; 

END; 
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union  <- 

EXPR ( s 1 :PTR(rrSET)  , s2 : PTR ( rrSET ) ; PTR ( rrSET ) ) 

BEGIN 

LENGTH (si. members ) > LENGTH ( s2 . members ) = > union(s2,  si): 
DECL  sz : PTR ( rrSET ) BYVAL  ALLOC ( rrSET  BYVAL  VAL(s2)); 

DECL  s 1 p : PTR ( rrMEM ) BYVAL  sl.ptrs; 

REPEAT 

sip  = NIL  =>  NIL; 

insert ( s 1 p . elem , VAL(sz)); 

sip  <-  sip. next ; 

END; 
sz  ; 

END; 

disjoint  <- 

EXPR ( s 1 : PTR ( rrSET ) , s2 : PTR ( rrSET ) ; BOOL) 

BEGIN 

LENGTH(s2. members)  > LENGTH(s 1 .members)  r> 
dujoint(s2,  si); 

FOR  i FROM  1 TO  LENGTH ( s2 . members ) 

REPEAT 

s2.members[i]  AND  si .members[i]  =>  FALSE; 

TRUE; 

END ; 

END; 

subst  <- 

EXPR ( u : PTR ( rr STRUCT ) , liarrrSET  SHARED;  poly) 

BEGIN 

u element  lia  =>  polytnake(u) ; 
u.fn( BEGIN 

liD(  VAL  ( u . left  father ) ) a rrSTRUCT  => 
subst(u. left  fat  her,  lia); 

V AL ( u . lef tfather ) ; 

END, 

BEGIN 

MD( VAL(u . rightfather ) ) = rrSTRUCT  => 
sufcst(u.rightfather,  lia); 

VAL( u . rightfather) ; 

END)  ; 

END; 


calcdes 1th  <- 

EXP R . p : poly , k : INT ) 

FOR  i FROM  1 TO  LENGTH (p) 

REPEAT 

DECL  t : PTR( term ) BYVAL  p[i].var; 

FOR  j FROM  1 TO  LENGTH ( t ) 

REPEAT 

k * t[j].exp  > t[j] .var.deslth  -> 
t[J]. var.deslth  <-  k « t[j].exp; 

END; 

END; 

evalpoly  <- 

EXPR(u:PTR(rrSTRUCT) , p:poly) 

BEGIN 

DECL  pow: PTR (poly)  BYVAL 
ALLOC(poly  LIKE  p “ (u.curlth  + 1)); 

DECL  mom: VECTOR( 16 , REAL)  SHARED  u.mom; 

FOR  m FROM  u.curlth  + 1 TO  u.deslth 
REPEAT 

m //  u.curlth  + 1 -> 

pow  <-  ALLOC ( poly  LIKE  p » VAL(pow)); 
mom[m]  <-  eval term ( mom [m ] , VAL(pow))- 
END;  y 

u.curlth  <-  u.deslth; 

END; 

evalterm  <- 

EXPR (m  : REAL  , pow:poly  SHARED;  REAL) 

BEGIN 

FOR  i FROM  1 TO  LENGTH (pow) 

REPEAT 

DECL  tlterm  SHARED  VAL ( pow[ i ] . var ) ; 
m < + 

pow[ i ] . coef f * 

BEGIN 

DECL  prod : REAL  BYVAL  1.0; 

FOR  j FROM  1 TO  LENGTH ( t ) 

REPEAT 

prod  <-  prod  * t[ J ] . var  .mom[ t[ j J . exp] 
END; 
prod ; 

END; 

END; 

END; 
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makedist  <- 

EXPR { datamd : MODE , mean:REAL,  momgenf : ROUTINE  ; real\random) 

BEGIN 

DECL  z: realXrandom; 

DECL  lowerz : realXrandom . JR  SHARED  LOWER(z); 
lowerz.data  <-  ALLOC (datamd ) ; 
lowerz. momgen  <-  momgenf; 
lower z . curl th  <-  1; 
lower z . mom [ 1 ] <-  mean; 
z J 
END; 

point  <- 

EXPR(x:REAL;  realXrandom)  [)  makedist(NONE , x,  pointmomgen)  (]; 
polymake  <- 

EXPR ( x : real Xrandom  ; polv) 

BEGIN 

DECL  p : PTR ( poly ) BYVAL  ALLOC(poly  SIZE  1); 

DECL  t : PTfi( term)  BYVAL  ALLOC(term  SIZE  1); 
p[ 1 ] . coef f <-  1.0; 
p[  1 ] . var  <-  t ; 
t[  1 ] . ex  p <-  1 ; 
t[ 1 ] .var  <-  x ; 

VAL(p) ; 

END; 

pointmomgen  <- 

EXPR ( x : PTR ( rr STRUCT ) SHARED) 

BEGIN 

FOR  i FROM  x.curlth  + 1 TO  x.deslth 

REPEAT  x.momli]  <-  x.mom[1]  * x.mom[i  - 1]  END; 

x.curlth  <-  x.deslth; 

END; 

uniform  <- 

EXPR (a : REAL  , b:REAL;  realXrandom) 

BEGIN 

DECL  x : realXrandom  SHARED 

maked is t ( STRUCT ( a : REAL , b:REAL,  bpow:REAL,  sum:REAL), 

(a  + b)  / 2, 
unifcrmmomgen ) ; 

DECL  xdata : REF  BYVAL  LOWER ( x ). da ta ; 

xdata.a  <-  a; 

xdata. b <-  b; 

xdata. bpow  <-  b; 

xdata. sum  <-  a + b; 

x; 

END; 
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uniformmomgen  <- 

EXPR(x:PTR(rr STRUCT)  SHARED) 

BEGIN 

DECL  xdata:REF  BYVAL  x.data; 

FOR  i FROM  x.curlth  + 1 TO  x.deslth 
REPEAT 

xdata.bpow  <-  xdata.b  * xdata.bpow; 
xdata.sum  <-  xdata.a  * xdata.sum  + xdata.bpow; 
x.mom[i]  <-  xdata.sum  / (i  + 1); 

END; 

x.curlth  <-  x.deslth; 

END; 

nextmom  <- 

EXPR ( n : INT  , mom : VECTOR ( 1 6 , REAL)  SHARED,  m : REAL ; REAL) 
BEGIN 

DECL  u : INT  BYVAL  1; 

DECL  var : REAL  BYVAL  - mom[1]; 

DECL  sum:REAL  BYVAL  var; 

FOR  i FROM  1 TO  n - 1 
REPEAT 

u<-u*(n+1-i)/i; 

sum  <-  var  * (sum  + u * nom[i]); 

END; 

m - sum; 

END; 

gaussian  <- 

EXPR (mean : REAL  , sigma:REAL;  real\random) 

BEGIN 

DFCL  x : real\random  SHARED 
makedist (STRUCT (pow: REAL , var : REAL  ) , 
mean , 

gaussianmomgen ) ; 

DECL  xdata : REF  BYVAL  LOWER ( x ). data ; 

xdata.pow  <-  1.0; 

xdata. var  <-  sigma  * sigma; 

x; 

END; 

gaussianmomgen  <- 

EXPR ( x : PTR ( rr STRUCT ) SHARED) 

BEGIN 

DECL  xdati  : REF  BYVAL  x.data; 

FOR  i FPr"l  x.curlth  + 1 TO  x.deslth 
REPEAT 

x.mom[i]  <- 
BEGIN 

nextmom( i , 

x . mom , 

BEGIN 

i / 2 • 2 = l => 
xdata.pow  <- 

( i - 1 ) # xdata.pow  * xdata .var ; 
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0; 

END) ; 

END; 

END; 

x.curlth  <-  x.deslth; 

END; 

poissor  <- 

EXPR ( lambda : REAL ; real\random) 

BEGIN 

DECL  x : realNrandom  SHARED 

makedist( STRUCT (pi : VECTOR (17,  INT),  p2 : VECTOR ( 1 7 , INT ) ) , 
lambda , 

poissonmomg.n) ; 

DECL  xdata  : REF  BYVAL  LOWER ( x ).  da  ta  ; 

xdata  . p2[  1 ] <-  2; 

xdata . p 2 [ 2 ] <-  1 ; 

xdata  . p 1 [ 1 j <-  1 ; 

x; 

END; 

poissonmomgen  <- 

EXPR(x: PTR( rrSTRUCT)  SHARED) 

BEGIN 

DECL  xdata  : REF  BYVAL  x.data; 

DECL  p 1 : VECTOR (17,  INT)  SHARED  xdata.il; 

DECL  p2  : VECTOR (17,  INT)  SHARED  xdata. p2; 

FOR  i FROM  x.curlth  + 1 TO  x.deslth 
REPEAT 

x.mom[i]  <- 
BEGIN 

nextmom( i , 

x . mcm . 

BFGIN 

DECL  sum : RE AL  EYVAL  0.0; 

FOR  j FROM  p2[  1 ] BY  - 1 TO  2 
REPEAT 

sum  <-  (sum  + p2[j])  * x.mom[l]; 

END; 

FOR  j FROM  pit  1 ] 3Y  - 1 TO  2 

REPEAT  p1[j  + 1]  <-  p 1 [ j ] * i END; 
p 1 [ 2 J <-  0; 

FOR  j FROM  pi  [ 1 ] + 1 BY  - 1 TO  2 
REPEAT 

DECL  t, : I N T BYVAL  p 1 [ j ] ; 

pi[j]  <-  p2[ j ] ; 

p2[ j ] <-  ( j - 1 ) * p2[ j ] + t; 

END; 

BEGIN 

DECL  t : I NT  BYVAL  p 1 [ 1 ] ; 
p1[l]  <-  p2[ 1 ] ; 
p2[1]  <-  1 + t; 

END; 

sum; 

END) ; 
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END; 


END; 

x.curlth  <-  x.deslth; 

END; 

distrinution  <- 

EXPR( tab :SE0( VECTOR ( 2 , REAL));  real\random) 

BEGIN 

DECL  z : real \ random ; 

DECL  lowers: real \ random . UR  SHARED  LOWER(z); 
lowerz . data  <-  ALLOC ( MD( tab ) BYVAL  tab); 
lowerz . momgen  <-  dismomgen; 
lowerz . cur  1th  <-  0; 
z; 

END; 

dismomgen  <- 

EXPR(x:PTR( rrSTRUCT)  SHARED) 

BEGIN 

FOR  i FROM  x.curlth  + 1 TO  x.deslth 
REPEAT 

DECL  sum : REA L BYVAL  0.0; 

FOR  j FROM  1 TO  LENGTH! x .data) 

REPEAT 

sum  <-  sum  + x.data[j][1J  * x . dataf j ] [ 2 ] * i; 

END ; 

x.mora[i]  <-  sum; 

END ; 

x.curlth  x.deslth; 

END; 

buildtab  <- 

EXPR(x:SE0( REAL)  ; SEQ( VECTOR ( 2 , REAL))) 

BEGIN 

DECL  z:SEQ(VECT0R(2,  REAL))  SIZE  LENGTH ( x ) / 2; 

FOR  i FROM  1 TO  LENGTH! x)  / 2 

REPEAT  z[i][1]  <-  x[2  * i - 1);  z[i][2]  <-  x [ 2 « i]  END; 
z; 

END; 
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